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Abstract: Observables which discriminate boosted topologies from massive QCD jets are 
of great importance for the success of the jet substructure program at the Large Hadron 
Collider. Such observables, while both widely and successfully used, have been studied al¬ 
most exclusively with Monte Carlo simulations. In this paper we present the first all-orders 
factorization theorem for a two-prong discriminant based on a jet shape variable, D 2 , valid 
for both signal and background jets. Our factorization theorem simultaneously describes 
the production of both collinear and soft subjets, and we introduce a novel zero-bin pro¬ 
cedure to correctly describe the transition region between these limits. By proving an all 
orders factorization theorem, we enable a systematically improvable description, and allow 
for precision comparisons between data, Monte Carlo, and first principles QCD calculations 
for jet substructure observables. Using our factorization theorem, we present numerical re¬ 
sults for the discrimination of a boosted Z boson from massive QCD background jets. We 
compare our results with Monte Carlo predictions which allows for a detailed understand¬ 
ing of the extent to which these generators accurately describe the formation of two-prong 
QCD jets, and informs their usage in substructure analyses. Our calculation also provides 
considerable insight into the discrimination power and calculability of jet substructure 
observables in general. 
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1 Introduction 

The last several years has seen a surge of interest in the field of jet substructure [1—4], 
both as an essential tool for extending new physics searches at the Large Hadron Collider 
(LHC) into the TeV energy regime, and as an important playground for improving our 
understanding of high energy QCD, both perturbative and non-perturbative. Of particular 
phenomenological interest are substructure observables that are sensitive to hard subjets 
within a jet. In the highly boosted regime, the hadronic decay products of electroweak- 
scale particles can become collimated and each appear as a jet in the detector. Unlike 
typical massive QCD jets, however, these boosted electroweak jets exhibit a multi-prong 
substructure that can be identified by the measurement of appropriate observables. Many 
such observables have been proposed and studied on LHC simulation or data [5-27] or used 
in new physics searches [28-40]. 

The vast majority of proposed jet substructure observables, however, have been ana¬ 
lyzed exclusively within Monte Carlo simulation. While Monte Carlos play an essential role 
in the simulation of realistic hadron collision events, they can often obscure the underlying 
physics that governs the behavior of a particular observable. Additionally, it is challenging 
to disentangle perturbative physics from the tuning of non-perturbative physics so as to 
understand how to systematically improve the accuracy of the Monte Carlo. Recently, 
there has been an increasing number of analytical studies of jet substructure observables, 
including the calculation of the signal distribution for IV-subjettiness to next-to-next-to- 
next-to-leading-log order [41], a fixed-order prediction for planar flow [42], calculations of 
groomed jet masses [43-46] and the jet profile/ shape [47-53] for both signal and back¬ 
ground jets, an analytic understanding of jet charge [54, 55], predictions for fractional jet 
multiplicity [56], and calculations of the associated subjet rate [57]. Especially in the case 
of the groomed jet observables, analytic predictions informed the construction of more per- 
formant and easier to calculate observables. With the recent start of Run 2 of the LHC, 
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where the phase space for high energy jets only grows, it will be increasingly important to 
have analytical calculations to guide experimental understanding of jet dynamics. 

It is well known that the measurement of observables on a jet can introduce ratios of 
hierarchical scales appearing in logarithms at every order in the perturbative expansion. 
Accurate predictions over all of phase space require resummation of these large logarithms 
to all orders in perturbation theory. While this resummation is well understood for simple 
observables such as the jet mass [58-62], where it has been performed to high accuracy, a 
similar level of analytic understanding has not yet been achieved for more complicated jet 
substructure observables. Jet substructure observables are typically sensitive to a multitude 
of scales, corresponding to characteristic features of the jet, resulting in a much more subtle 
procedure for resummation. 

A ubiquitous feature of some of the most powerful observables used for identification 
of jet substructure is that they are formed from the ratio of infrared and collinear (IRC) 
safe observables. Examples of such observables include ratios of IV-subjettiness variables 
[63, 64], ratios of energy correlation functions [65-67], or planar flow [68]. In general, 
ratios of IRC safe observables are not themselves IRC safe [69] and cannot be calculated 
to any fixed order in perturbative QCD. Nevertheless, it has been shown that these ra¬ 
tio observables are calculable in resummed perturbation theory and are therefore referred 
to as Sudakov safe [70-73]. Distributions of Sudakov safe observables can be calculated 
by appropriately marginalizing resummed multi-differential cross sections of IRC safe ob¬ 
servables. An understanding of the factorization properties of multi-differential jet cross 
sections has been presented in Refs. [74-76] by identifying distinct factorization theorems 
in parametrically separated phase space regions defined by the measurements performed 
on the jet. Combining this understanding of multi-differential factorization with the re¬ 
quired effective field theories, all ingredients are now available for analytic resummation 
and systematically improvable predictions. 

As an explicit example, observables that resolve two-prong substructure are sensitive to 
both the scales characterizing the sub jets as well as to the scales characterizing the full jet. 
A study of the resummation necessary for describing jets with a two-prong substructure 
was initiated in Ref. [77] which considered the region of phase space with two collinear 
subjets of comparable energy, and introduced an effective field theory description capturing 
all relevant scales of the problem. Recently, an effective field theory description for the 
region of two-prong jet phase space with a hard core and a soft, wide angle subjet was 
developed in Ref. [76], where it was applied to the resununation of non-global logarithms 
[78]. Combined, the collinear subjet and soft subjet factorization theorems allow for a 
complete description of the dominant dynamics of jets with two-prong substructure. 

In this paper we will study the factorization and resummation of the jet substructure 
observable D 2 [66], a ratio-type observable formed from the energy correlation functions. 
We will give a detailed effective theory analysis using the language of soft-collinear effective 
theory (SCET) [79-82] in all regions of phase space required for the description of a one 
or two-prong jet, and will prove all-orders leading-power factorization theorems in each 
region. We will then use these factorization theorems to calculate the D 2 distribution for 
jets initiated by boosted hadronic decays of electroweak bosons or from light QCD partons 
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Figure 1: Comparison of our analytic calculation with Vincia Monte Carlo predictions 
for the two prong discriminant, Predictions for both boosted Z bosons and massive 
QCD jets at a 1 TeV e + e _ collider are shown. The Monte Carlo is fully hadronized, and 
non-perturbative effects have been included in the analytic calculation through a shape 
function. In a) we show the complete distribution, and in b) we zoom in to focus on the 
region relevant for boosted Z discrimination. 


and compare to Monte Carlo simulation. These calculations will also allow us to make 
first-principles predictions for the efficiency of the observable D 2 to discriminate boosted 
electroweak signal jets from QCD background jets. 

Our factorized description is valid to all orders in a s , expressing the cross section as 
a product of held theoretic matrix elements, each of which is calculable order by order in 
perturbation theory, allowing for a systematically improvable description of the D 2 observ¬ 
able. Furthermore, the factorization theorem enables a clean separation of perturbative and 
non-perturbative physics, allowing for non-perturbative contributions to the observable to 
be included in the analytic calculation through the use of shape functions [83, 84]. In this 
paper we work to next-to-leading logarithmic (NLL) accuracy to demonstrate all aspects 
of the required factorization theorems necessary for precision jet substructure predictions. 
We will see that even at this first non-trivial order, we gain insight into qualitative and 
quantitative features of the D 2 distribution. While we will give an extensive discussion 
of our numerical results and comparisons with a variety of Monte Carlo programs in this 
paper, in Fig. 1 we compare our analytic predictions for the D 2 observable, including non- 
perturbative effects, for hadronically-decaying boosted Z bosons and QCD jets in e + e“ 
collisions with the distributions predicted by the Vincia [85-90] Monte Carlo program at 
hadron level. Excellent agreement between analytic and Monte Carlo predictions is ob¬ 
served, demonstrating a quantitative understanding of boosted jet observables from first 
principles. 
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1.1 Overview of the Paper 

While there exists a large number of two-prong discriminants in the jet substructure litera¬ 
ture, any of which would be interesting to understand analytically, we will use calculability 
and factorizability as guides for constructing the observable to study in this paper. This 
procedure will ultimately lead us to the observable D 2 and will demonstrate that D 2 has 
particularly nice factorization and calculability properties. This approach will proceed in 
the following steps: 

1. Identify the relevant subjet configurations for the description of a two-prong discrim¬ 
inant. 

2. Isolate each of these relevant regions by the measurement of a collection of IRC safe 
observables. 

3. Study the phase space defined by this collection of IRC safe observables, and prove 
all-orders factorization theorems in each parametrically-defined region of phase space. 

4. Identify a two-prong discriminant formed from the collection of IRC safe observables 
which respects the parametric factorization theorems of the phase space. 

A detailed analysis of each of these steps will be the subject of this paper. Here, we provide 
a brief summary so that the logic of the approach is clear, and so that the reader can skip 
technical details in the different sections without missing the general idea of the approach. 

The complete description of an observable capable of discriminating one- from two- 
prong substructure requires the factorized description of the following three relevant subjet 
configurations, shown schematically in Fig. 2: 

• Soft Haze: Fig. 2a shows a jet in what we will refer to as the soft haze region of 
phase space. In the soft haze region there is no resolved subjet, only a single hard 
core with soft wide angle emissions. This region of phase space typically contains 
emissions beyond the strongly ordered limit, but is the dominant background region 
for QCD jets, for which a hard splitting is a s suppressed. 

• Collinear Subjets: Fig. 2b shows a jet with two hard, collinear subjets. Both 
subjets carry approximately half of the total energy of the jet, and have a small 
opening angle. This region of phase space, and its corresponding effective field theory 
description, has been studied in Ref. [77]. 

• Soft Subjet: Fig. 2c shows the soft subjet region of phase space which consists of 
jets with two subjets with hierarchical energies separated by an angle comparable to 
the jet radius R. The soft subjet probes the boundary of the jet and we take R ~ 1. 
An effective field theory description for this region of phase space was presented in 
Ref. [76], 

As a basis of IRC safe observables for isolating these three subjet configurations, we 
use the energy correlation functions [65], which we define in Sec. 2.1. In particular, we 
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Figure 2: Regions of interest for studying the two-prong substructure of a jet. a) Soft 
haze region in which no subjets are resolved, b) Collinear subjets with comparable energy 
and a small opening angle, c) Soft subjet carrying a small fraction of the total energy, and 
at a wide angle from the hard subjet. 


will show that the measurement of three energy correlation functions, two 2-point energy 
correlation functions eP , zP , and a 3-point energy correlation function , allows for 
parametric separation of the different subjet configurations. While we will focus on the 
particular case of observables formed from the energy correlation functions, we believe that 
this approach is more general and could be applied to other IRC safe observable bases. 

With the energy correlation functions as our basis, we study the multi-differential phase 
space defined by the simultaneous measurement of these observables on a jet in Sec. 2. 
Using the power counting technique of Refs. [66, 67], we show that the angular exponents 
of the energy correlation functions, a and /?, can be chosen such that the different subjet 
configurations occupy parametrically separated regions of this phase space, and extend to 
all boundaries of the phase space. This parametric separation allows for each region to be 
separately described by its own effective field theory. The required effective field theories are 
described in Sec. 3, and are formulated in the language of SCET. The formulation in SCET 
allows us to prove all-orders factorization theorems valid at leading-power in each of the 
phase space regions, and to resum logarithms to arbitrary accuracy using renormalization 
group techniques. 

Having understood in detail both the structure of the phase space defined by the IRC 
safe measurements as well as the factorization theorems defined in each region, we will 
show in Sec. 4.1 that this leads unambiguously to the definition of a two-prong discriminant 
observable which is amenable to factorization. This observable will be a generalized form of 
D 2 [66] which will depend on both angular exponents a and /3. Calculating the distribution 
of D 2 is accomplished by appropriate marginalization of the multi-differential cross section. 
Depending on the phase space cuts that have been made, D 2 may or may not be IRC 
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safe itself, and so the marginalization will in general only be defined within resummed 
perturbation theory. 

The outline of this paper is as follows. In Sec. 2 we define the energy correlation 
functions used in this paper and describe how the different subjet configurations shown 
schematically in Fig. 2 can be isolated by demanding parametric relations between the 
measured values of these observables. In Sec. 3 we discuss the effective field theory de¬ 
scriptions in the different phase space regions, and present the factorization theorems that 
describe their dynamics. Although some of the relevant effective field theories have been 
presented elsewhere, we attempt to keep the discussion self-contained by providing a brief 
review of their most salient features. All field theoretic definitions of the functions appear¬ 
ing in the factorization theorems, as well as their calculations to one-loop accuracy, are 
provided in appendices. 

In Sec. 4 we show how the detailed understanding of the multi-differential phase space 
leads to the definition of Di as a powerful one- versus two-prong jet discriminant. In 
Sec. 4.2 we emphasize that without a mass cut, D 2 is not IRC safe but is Sudakov safe and 
whose all-orders distribution exhibits paradoxical dependence on a s . In Sec. 4.3 we study 
the fixed-order distribution of D 2 in the presence of a mass cut to understand its behavior 
in singular limits. In Sec. 4.4 we discuss how the different effective field theories can be 
consistently merged to give a factorized description of the D 2 observable, and introduce a 
novel zero-bin procedure to implement this merging. 

In Sec. 5 we present numerical results for both signal and background distributions 
for D 2 as measured in e + e - collisions and compare our analytic calculation with several 
Monte Carlo generators. We emphasize many features of the calculation which provide 
considerable insight into two-prong discrimination, and the ability of current Monte Carlo 
generators to accurately describe substructure observables. In Sec. 6 we discuss numerical 
results for the D 2 observable at e + e~ collisions at the Z pole at LEP, and demonstrate that 
being sensitive to correlations between three emissions, the D 2 observable can be used as 
a more differential probe of the perturbative shower for tuning Monte Carlo generators. In 
Sec. 7 we discuss how to extend our calculations to pp collisions at the LHC. We conclude 
in Sec. 8, and discuss future directions for further improving the analytic understanding of 
jet substructure. 

2 Characterizing a Two-Prong Jet 

In this section, we develop the framework necessary to construct the all-orders factorization 
theorems for analytic two-prong discrimination predictions. We begin in Sec. 2.1 by defining 
the energy correlation functions, which we will use to isolate the three sub jet configurations 
discussed in the introduction. Using the power counting analysis of Refs. [66, 67], we 
study the phase space defined by measuring the energy correlation functions in Sec. 2.2. 
Throughout this paper, our proxy for a two-prong jet will be a boosted, hadronically 
decaying Z boson, but our analysis holds for W or H bosons, as well. 
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2.1 Observable Definitions 


To distinguish the three different subjet configurations of Fig. 2 with IRC safe measure¬ 
ments, observables which are sensitive to both one- and two-prong structure are required. 
Although many possible observable bases exist, in this paper we will use the energy corre¬ 
lation functions [65, 66], as we will find that they provide a convenient basis. 

The n-point energy correlation function is an IRC safe observable that is sensitive to 
n-prong structure in a jet. For studying the two-prong structure of a jet, we will need the 
2- and 3-point energy correlation functions, which we define for e + e _ collisions as [65] 1 
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Here J denotes the jet, E t and pi are the energy and four momentum of particle i in the 
jet and a is an angular exponent that is required to be greater than 0 for IRC safety. The 
4-point and higher energy correlation functions are defined as the natural generalizations 
of Eq. (2.1), although we will not use them in this paper. 

While we will mostly focus on the case of an e + e~ collider, the energy correlation 
functions have natural generalizations to hadron colliders, by replacing E by px and using 
hadron collider coordinates, p and (j). This definition is given explicitly in Eq. (7.1). At 
central rapidity, this modification does not change the behavior of the observables, or 
any of the conclusions presented in the next sections. Of course, the hadron collider 
environment has other effects not present in an e + e _ collider, like initial state radiation 
or underlying event, that will affect the energy correlation functions. A brief discussion 
of the behavior of the energy correlation functions in pp colliders will be given in Sec. 7. 
Numerical implementations of the energy correlation functions for both e + e _ and hadron 
colliders are available in the EnergyCorrelator Fast Jet CONTRIB [93, 94], 

2.2 Power Counting the e%*\ eij*\ Phase Space 

With a basis of IRC safe observables identified, we now demonstrate that the measurement 
of multiple energy correlation functions parametrically separates the three different subjet 
configurations identified in Fig. 2. In particular, the simultaneous measurement of e^\ e^\ 
and eg is sufficient for this purpose, and we will study in detail the phase space defined by 
their measurement. From this analysis, we will be able to determine for which values of the 
angular exponents a and (3 the three subjet configurations are parametrically separated 
within this phase space. The power counting parameters that define “parametric” will be 
set by the observables themselves, as is typical in effective field theory. 

1 For massive hadrons, there exist several possible definitions of the energy correlation functions depend¬ 
ing on the particular mass scheme [91, 92]. The definition in Eq. (2.1) is an E-scheme definition. A p-scheme 
definition will be presented in Sec. 6 when we discuss the connection to LEP. Since the different definitions 
are equivalent for massless partons, their perturbative calculations are identical. The different definitions 
differ only in their non-perturbative corrections. 
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Figure 3: a) Schematic of a one-prong soft haze jet, dominated by collinear (blue) and 
soft (green) radiation. The angular size of the collinear radiation is 0 CC and the energy 
fraction of the soft radiation is z s . b) Schematic of a jet resolved into two collinear subjets, 
dominated by collinear (blue), soft (green), and collinear-soft (orange) radiation emitted 
from the dipole formed by the two subjets. The subjets are separated by an angle #12 and 
the energy fraction of the collinear-soft radiation is z cs . 


We begin by considering how the energy correlation functions can be used to separate 
one- and two-prong jets. This has been previously discussed in Ref. [66] by measuring ej, 0 ' 1 
and 4 a) , but here we consider the phase space defined by elf '- 1 and with a and (3 in 
general different. A minimal constraint on the angular exponents, both for calculability 
and discrimination power, is that the soft haze and collinear subjets configurations are 
parametrically separated by the measurements. A power counting analysis of the soft 
subjet region yields no new constraints beyond those from the soft haze or collinear subjets. 

The setup for the power counting analysis of the soft haze and collinear subjets con¬ 
figurations is shown in Fig. 3, where all relevant modes are indicated. The one-prong jet 
illustrated in Fig. 3a is described by soft modes with energy fraction z s emitted at 0(1 ) 
angles, and collinear modes with characteristic angular size 9 CC with 0(1 ) energy fraction. 
The collinear subjets configuration illustrated in Fig. 3b consists of two subjets, each of 
which carry an 0 (1) fraction of the jet’s energy and are separated by an angle 8 12 <C 1. 
Each of the sub jets has collinear emissions at a characteristic angle 8 CC <C # 12 , and global 
soft radiation at large angles with respect to the subjets, with characteristic energy frac¬ 
tion z s <C 1. In the case of two collinear subjets arising from the decay of a color singlet 
particle, the long wavelength global soft radiation is not present due to color coherence, 
but the power counting arguments of this section remain otherwise unchanged. Finally, 
there is radiation from the dipole formed from the two subjets (called “collinear-soft” ra¬ 
diation), with characteristic angle 812 from the subjets, and with energy fraction z cs . The 
effective theory of this phase space region for the observable IV-jettiness [95] was studied 
in Ref. [77]. 2 

2 It is of historical interest to note that the generalization of two-prong event shapes, such as thrust, to 
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Table 1: Parametric scaling of the upper and lower boundaries of the one-prong region of 
the e^\ dp phase space as a function of the angular exponents a and (3. 


We are now able to determine the parametric form of the dominant contributions to 
the observables dp and dp . In the soft haze region, the dominant contributions to the 
energy correlation functions are 3 


dP ~ z s + 9@ , 

4 a) ~6l a + zl + e«z s . ( 2 . 2 ) 


From these parametrics, it is straightforward to show that one-prong jets live in a region 
of the eP, dp phase space bounded from above and below, whose precise scaling depends 
on the relative size of the angular exponents a and (3. The scaling of upper and lower 
boundaries of the one-prong region of phase space for all a and /3 are listed in Table 1. For 


a = /3, as studied in Ref. [66], one-prong jets live in the region defined by 

,(/3) n2 


4PV 


< 




For the collinear subjets configuration, the dominant contributions to the observables 


ep and are 


'2 

= 0) 


7 12 J 

rjlz + o' 


12 Z s + @12 ^cs + Z z s 


3 a , 


(2.3) 


The 2-point energy correlation function dtp is set by the angle of the hard splitting, $ 12 , 
and the scaling of all other modes (soft, collinear, or collinear-soft) are set by the dp 
measurement. The requirement 


Zcs ~ 



< 1, 


(2.4) 


event shapes for characterizing three jet structure was considered early on, for example with the introduction 

of the triplicity event shape [96]. However, it was not until more recently, with the growth of the jet 
substructure field at the LHC, that significant theoretical study was given to such observables. 

3 It is important to understand that this relationship is valid to an arbitrary number of emissions. When 
performing the power counting, a summation over all the particles with soft and collinear scalings in the 
jet must be considered. However, to determine the scalings of the observable, it is sufficient to consider the 
scaling of the different types of individual terms in the sum. For example, the three terms contributing 
to the expression for arise from correlations between subsets of three collinear particles, one collinear 
particle and two soft particles, and two collinear particles and a soft particle, respectively. Contributions 
from other combinations of particles are power suppressed. Because of this simplification, in this paper we 
will never write explicit summations when discussing the scaling of observables. 
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then implies that the two-prong jets occupy the region of phase space defined by <C 

For optimal discrimination, the one- and two-prong regions of this phase space should 
not overlap. Since they are physically distinct, a proper division of the phase space will 
allow distinct factorizations, simplifying calculations. Comparing the boundaries of the 
one-prong region listed in Table 1 with the upper boundary of the two-prong region from 
Eq. (2.4), we find that the one- and two-prong jets do not overlap with the following 
restriction on the angular exponents a and /3: 


3a >2/3. (2.5) 

Note that when a = /3 this is satisfied, consistent with the analysis of Ref. [66]. 

Because these power counting arguments rely exclusively on the parametric behavior 
of QCD in the soft and collinear limits, they must be reproduced by any Monte Carlo 
simulation, regardless of its shower and hadronization models. To illustrate the robust 
boundary between the one- and two-prong regions of phase space predicted in Eq. (2.5), in 
Fig. 4, we plot the distribution in the e^\ plane of jets initiated by light QCD partons 

and those from boosted hadronic decays of Z bosons as generated in e + e _ collisions in 
Pythia [97, 98]. Details of the Monte Carlo generation are presented in Sec. 5. QCD 
jets are dominantly one-pronged, while jets from Z decays are dominantly two-pronged. 
We have chosen to use angular exponents a = (3 = 1 for this plot, as the small value 
of the angular exponent allows the structure of the phase space to be seen in a non- 
logarithmic binning. The predicted behavior persists for all values of a and (3 consistent 
with Eq. (2.5), while the choice made here is simply for illustrative aesthetics. On these 
plots, we have added dashed lines corresponding to the predicted one- and two-prong phase 
space boundaries to guide the eye. The one-prong QCD jets and the two-prong boosted Z 
jets indeed dominantly live in their respective phase space regions as predicted by power 
counting. 

The measurement of and alone is sufficient to separate one- and two prong 
jets. However, the two-prong jets can exhibit either collinear subjets or a soft, wide angle 
subjet. To separate the collinear and soft subjet two-prong jets, we make an additional 
IRC safe measurement on the full jet. Following Ref. [76], in addition to and e^ a \ we 
measure e%\ with a / f3. In particular, the soft subjet and collinear subjet regions of 
phase space are defined by the simple conditions 


Collinear Subjet: 

4 “’ ~ ( ef)“ / ' i . 

(2.6) 

Soft Subjet: 

< 4 “> ~ 4 « ■ 

(2.7) 


For a ^ /3 and e^f ' 1 <C 1, these two regions are parametrically separated. Equivalently, in 
the two-prong region of phase space the measurement of both e>^ and can be used to 
give IRC safe definitions to the subjet energy fraction and splitting angle, allowing the soft 
subjet and collinear subjets to be distinguished. In Fig. 5 we summarize and illustrate the 
measurements that we make on the jet and the parametric relations between the measured 
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Figure 4: Monte Carlo distributions in the plane, for QCD quark jets (left) and 

boosted Z —y qq jets (right). The parametric scalings predicted by the power counting 
analysis are shown as dashed lines, and the one- and two-prong regions of phase space 
are labelled, and extend between the parametric boundaries. Note the upper boundary is 
constrained to have a maximal value of ^(e^) 2 = e^ a \ 


values of the energy correlation functions that define the three phase space regions. The 
phase space plots of Figs. 5b and 5c were also presented in Ref. [76]. 


2.2.1 Jet Mass Cuts 


In addition to discriminating QCD jets from boosted Z bosons by their number of resolved 
prongs, we must also impose a mass cut on the jet to ensure that the jet is compatible with 
a Z decay. To include a mass cut in our analysis, for general angular exponents a and /3, 
we would need to measure four observables on the jet: e^, and the jet mass. 

This would significantly complicate calculations and introduce new parametric phase space 
regions that would need to be understood. To avoid this difficulty, we note that, for our 
definition of from Eq. (2.1), if all final state particles are massless, then 


„(2) _ "Q 
2 


( 2 . 8 ) 


where mj is the mass of the jet. Therefore, choosing /3 = 2 we can trivially impose a 
mass cut within the framework developed here. Throughout the rest of this paper, we will 
set (3 = 2 for this reason. Importantly, from Monte Carlo studies it has been shown that 
[3 ~ 2 provides optimal discrimination power [65, 66], so this restriction does not limit the 
phenomenological relevance of our results. 

Substituting the value (3 = 2 into the power counting condition of Eq. (2.5), we find 
that the one- and two-prong regions of phase space are separated if 



(2.9) 
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Figure 5: a) Table summarizing the defining relations for the different subjet configura¬ 
tions in terms of the energy correlation functions ep > e P > e P ■ b) The one- and two-prong 
jets regions in the ep , e^' phase space. Jets with a two-prong structure lie in the lower 
(orange) region of phase space, while jets with a one-prong structure lie in the upper (pur¬ 
ple) region of phase space, c) The projection onto the ep\ eP phase space in which the 
soft subjet and collinear subjets are separated. 

To achieve a parametric separation of the one- and two-prong regions of phase space, we 
will demand that the scalings defining the different regions be separated by at least a single 
power of &P ■ For example, choosing a = f3 = 2, the scalings of the one-prong and two- 

prong regions are eP ~ ( e 2^) an< ^ e 3 °^ ~ ( e 2 ^) > which are parametrically different. 
We therefore restrict ourselves to the range of angular exponents 

P = 2, « >2. (2.10) 

We expect that for a < 2 our effective field theory description will begin to break down, 
while as a is increased above 2 it should improve. 

3 Factorization and Effective Field Theory Analysis 

In each region of phase space identified in Sec. 2, hierarchies of scales associated with 
the particular kinematic configuration of the jet appear. These include the soft subjet 
energy fraction z s j in the soft subjet region of phase space, or the splitting angle #12 of 
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the collinear subjets. Logarithms of these scales appear at each order in perturbation 
theory, and need to be resummed to all orders to achieve reliable predictions. To perform 
this resummation, we will prove factorization theorems in each region of phase space by 
developing an effective field theory description which captures all the scales relevant to 
that particular region of phase space. These effective field theories are formulated in the 
language of SCET [79-82], but include additional modes which are required to describe 
the dynamics of the scales associated with the jet’s particular substructure. Resummation 
is then achieved by renormalization group evolution within the effective theory. 

In this section we discuss each of the effective theories required for a description of the 
e 2 *\ phase space. For each region of the phase space, we present an analysis of 

the modes required in the effective field theory description and present the factorization 
theorem. We also provide a brief discussion of the physics described by each of the functions 
appearing in the factorization theorem. Field theoretic operator definitions of the functions, 
as well as their calculation to one-loop accuracy, are presented in appendices. 

3.1 QCD Background 

Three distinct factorization theorems are required to describe the full phase space for 
massive QCD jets, corresponding to the soft haze, collinear subjets, and soft subjet config¬ 
urations. Detailed expositions of the factorization theorems for the collinear subjets and 
soft subjet configurations have been presented in Refs. [76, 77], but here we review the 
important features of the factorization theorems to keep the discussion self-contained. 

Throughout this section, all jets are defined using the e + e _ anti-fc^ clustering metric 
[93, 99] with the Winner-Take-All (WTA) recombination scheme [72, 100]. To focus on the 
aspects of the factorization relevant to the jet substructure, we will present the factorization 
theorems for the specific case of e + e - —> qq. The factorization theorem for gluon initiated 
jets is identical to the quark case, and can be performed using the ingredients in the 
appendices. The extension to the production of additional jets or pp colliders will be 
discussed in Sec. 7. 

3.1.1 Collinear Subjets 

An effective field theory describing the collinear subjets configuration was first presented 
in Ref. [77] and is referred to as SCET + . We refer the interested reader to Ref. [77] for a 
more detailed discussion, as well as a formal construction of the effective theory. To our 
knowledge, our calculation is the first, other than that of Ref. [77], to use this effective 
theory. 

Mode Structure 

The modes of SCET + are global soft modes, two collinear sectors describing the radiation 
in each of the collinear subjets, and collinear-soft modes from the dipole of the subjet 
splitting. These are shown schematically in Fig. 6. The additional collinear-soft modes, as 
compared with traditional SCET, are necessary to resum logarithms associated with the 
subjets’ splitting angle. This angle, which is taken to be small, is not resolved by the long 
wavelength global soft modes. 
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The parametric scalings of the observables in the collinear sub jets region were given 
in Sec. 2.2 and are: 



9 a: 

'12 5 


(3.1) 


ef ~ 0? 2 , (3.2) 

e 3 ^ ~ ^12 (^$12 + z s + $12 z cs) ■ (3-3) 


Although the measurement of two 2-point energy correlation functions is required to be 
able to distinguish the soft and collinear subjets, they are redundant in the collinear sub¬ 
jets region from a power counting perspective, due to the relation e:f' 1 ~ ( e 2^) ■ We 

will therefore always write the scaling of the modes in terms of ej, 0 ' 1 and to simplify 
expressions. 

From Eq. (3.1), we see that e 2 sets the hard splitting scale, while the scalings of all 
the modes are set by the measurement of e^\ In particular, the scaling of the momenta 
of the collinear and soft modes are given by 



while the scaling of the collinear-soft mode is given by 


Pcs 




(3.6) 


Here Ej is the energy of the jet, and the subscripts denote the light-like directions with 
respect to which the momenta is decomposed. In the expressions above, the momenta 
are written in the (+, —,-L) component basis with respect to the appropriate light-like 
directions. The subjet directions are labelled by n a and rib, while the fat jet (containing 
the two subjets) and the recoiling jet are labelled by n and n. The relevant modes and a 
schematic depiction of the hierarchy of their virtualities is shown in Fig. 6. 

To have a valid soft and collinear expansion, the scalings of the modes in Eqs. (3.4) 
and (3.6) imply that 




and 



(ef) 3 ^ 


< 1 . 


This agrees with the boundaries of the phase space found in Sec. 2.2. 


(3.7) 
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Figure 6: A schematic depiction of the collinear subjets configuration with dominant QCD 
radiation and the functions describing its dynamics in the effective field theory is shown in 
a). The matching procedure and relevant scales are shown in b), where we have restricted 
to the case a = (3 = 2 for simplicity. 


Factorization Theorem 


In the collinear subjets region of phase space, the values of the 2-point energy correlation 
functions and are set by the hard splitting. To leading power, these observables 
can be used to provide IRC safe definitions of the subjet energy fractions and the angle 
between the subjets. We therefore write the factorization theorem in terms of , ejj , 
and the energy fraction of one of the subjets, which we denote by z. We further assume that 
an IRC safe observable, B, is measured in the out-of-jet region. Dependence on B enters 
only into the out-of-jet jet function, and the out-of-jet contribution to the soft function. 

The factorization theorem formulated in SCET+ for the collinear subjets region of 
phase space is given by 



where we have suppressed the convolution over the out-of-jet measurement, B, for sim¬ 
plicity. Here the n a , rib denote the collinear directions of the subjets, and we assume that 
z ~ 1 — 2 ~ The sum runs over all possible quark flavors that could be produced in an 
e + e _ collision. A brief description of the functions entering the factorization theorem of 
Eq. (3.8) is as follows: 


• H^ n is the hard function describing the underlying short distance process. In this 
case we consider e + e _ -» qq. 
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• Pl^tk (^z: is the hard function arising from the matching for the hard splitting 

into subjets. In this case the partonic channel / —> f a fb is restricted to q —> qg. 

• Jt (z\ ejj) , Jfr b (l - z; efj are jet functions describing the collinear dynamics of the 
subjets along the directions n a , rib. 

• S n n (e|, B] R) is the global soft function. The global soft modes do not resolve the 
subjet splitting, and are sensitive only to two eikonal lines in the n and h directions. 
The soft function depends explicitly on the jet radius, R. 

• $Un( e^j is the collinear-soft function. The collinear-soft modes resolve the subjet 
splitting, and hence the function depends on three eikonal lines, namely n a ,nb,n. 
Although these modes are soft, they are also boosted, and therefore do not resolve 
the jet boundary, so that the collinear soft function is independent of the jet radius, 
R. 

This factorization theorem is shown schematically in Fig. 6, which highlights the radiation 
described by each of the functions in Eq. (3.8), as well as their virtuality scales. The 
two stage matching procedure onto the SCET + effective theory, which proceeds through a 
refactorization of the jet function, is also shown. The fact that the refactorization occurs 
in the jet function is important in that it implies that it is independent of the global color 
structure of the event, making it trivial to extend the factorization theorem to events with 
additional jets. This matching procedure is discussed in detail in Ref. [77]. 

Operator definitions, and one-loop calculations for the operators appearing in the 
factorization theorem of Eq. (3.8) are given in App. B. 

3.1.2 Soft Subjet 

A factorization theorem describing the soft subjet region of phase space was recently pre¬ 
sented in Ref. [76]. In this section we review the basic features of this factorization theorem, 
but we refer the reader to Ref. [76] for a more detailed discussion. 

Unlike for the case of collinear subjets, in the soft subjet configuration, the wide angle 
soft subjet probes the boundary of the jet. This introduces sensitivity to the details of 
the jet algorithm used to define the jet, as well as to the measurement made in the region 
outside the jet. The factorization theorem of Ref. [76] is valid under the assumption that 
an additive IRC safe observable, B, is measured in the out-of-jet region, and that the soft 
scale associated with this observable, A, satisfies A /Ej -C e;^. We will therefore assume 
that this condition is satisfied throughout this section. However, we will see that the 
numerical results are fairly insensitive to the details of the choice of scale A. Ref. [76] also 
used a broadening axis [100] cone algorithm to define jets, whereas here we use the anti -kx 
algorithm, as relevant for phenomenological applications. We will argue that the structure 
of the factorization theorem is in fact identical in the two cases, to leading power. 

Mode Structure 

In the soft subjet region of phase space there are two sub jets with an energy hierarchy. 
We denote the energy of the soft subjet by z s j and the angle from the n axis by 0 s j . We 
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(a) (b) 


Figure 7: A schematic depiction of the soft subjet configuration with dominant QCD 
radiation and the functions describing its dynamics in the effective field theory is shown in 
a). The matching procedure and relevant scales are shown in b), where we have restricted 
to the case a = f3 = 2 for simplicity. 


also use the notation A 6 s j = R — 6 s j to denote the angle from the soft subjet axis to 
the jet boundary. The modes of the soft subjet are collinear-soft modes, being both soft 
and collimated, and we will therefore denote the characteristic angle between them as 0 CS . 
Straightforward power counting can be applied to determine the scaling of the modes for 
both the energetic jet and the soft subjet. Their contributions to the observable are given 
by 

4 °° ~ z sj , (3.9) 

~ , (3.10) 

4 °° ~ z sj ( 6 £ + z sj Ocs + z s ) • ( 3 . 11 ) 


In the soft subjet region of phase space, we have the relation e^ ~ e< ^\ an d therefore 
these two observables are redundant from a power counting perspective. We will therefore 
write the power counting of the modes in terms of and 4 • 

From the contributions to the observables above, we find that the momentum of the 
collinear and global soft radiation scales like 


Pc ~ Ej 



Ps ~ Ej 


(a) 



(3.12) 
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where Ej is the energy of the jet and n and n are the light-like directions of the jet of 
interest and the other jet in the event, respectively. The soft subjet mode’s momentum 
scales like 



(3.13) 


in the light-cone coordinates defined by the direction of the soft subjet, n s j. These are the 
complete set of modes defined by the scales set by the measurements of e^\ and e^ 
alone. 

Unlike in the collinear subjet region of phase space there are no collinear-soft modes 
required in the effective field theory description, since the soft subjet is at a wide angle 
from the jet axis. However, in this region there is an additional mode, termed a boundary 
soft mode in Ref. [76], whose appearance is forced by the jet boundary and the energy 
veto in the region of phase space outside the jet. These modes do not contribute to the 
e 2 observables, but are effectively a collinear-soft mode whose angle with respect to the 
soft subjet axis is set by the angle to the boundary. The boundary soft mode’s momentum 
components scale like 


, 0 ) 


Pbs ~ Ej- 


,(«) 


(A0. 


sjj 


((A e sj ) 2 ,i,A6 sj y 


(3.14) 


written in the light-cone coordinates defined by the soft subjet axis. The boundary soft 
modes are required to have a single scale in the soft subjet function. For consistency of the 
factorization, we must enforce that the soft subjet modes cannot resolve the jet boundary 
and that the boundary soft modes are localized near the jet boundary. That is, the angular 
size of the soft subjet modes, 9 CS , must be parametrically smaller than that of the boundary 
soft modes, namely A 9 S j. We therefore find the condition 


(A e sj r » ( e cs ) a ~ 



and A O s j <C 1. 


(3.15) 


Therefore, the factorization theorem applies in a region of the phase space where the soft 
subjet is becoming pinched against the boundary of the jet, but lies far enough away that 
the collinear modes of the soft subjet do not touch the boundary. A schematic depiction of 
this region of phase space, along with a summary of all the relevant modes which appear 
in the factorization theorem is shown in Fig. 7. 

In the soft subjet region of phase space, the choice of jet algorithm plays a crucial role, 
since the soft subjet probes the boundary of the jet. In Ref. [76] the factorization theorem in 
the soft subjet region of phase space was presented using a broadening axis cone algorithm 
with radius R. We now show that up to power corrections, the factorization theorem in 
the soft subjet region of phase space is identical with either the anti -kx or broadening 
axis cone algorithm. In particular, with the anti -kx algorithm, the jet boundary is not 
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deformed by the soft subjet, and can be treated as a fixed cone of radius R. This is not 
true for other jet algorithms, such as such as kx [101, 102] or Cambridge-Aachen [103-105], 
where the boundary is deformed by the clustering of soft emissions, a point which has been 
emphasized elsewhere (see, e.g., Refs. [106-109]). 

The validity of the factorization theorem requires the following two conditions, which 
will put constraints on the power counting in the soft subjet region of phase space. First, 
the soft subjet must be clustered with the jet axis, rather than with the out-of-jet radiation. 
This is guaranteed as long as the soft subjet axis satisfies 6 s j < R■ Second, the radiation 
clustered with the soft subjet from the out-of-jet region should not distort the boundary of 
the jet. More precisely, the distortion of the boundary must not modify the value of e ^ 
at leading power (note that the power counting guarantees that it does not modify )• 
The contribution to from a soft out-of-jet emission is given by 




A 

E~J 


< e 


(a) 


A 

E~J 


< 


3 (“) 

'3 

J a ) 



(3.16) 


Since the out-of-jet scale is in principle a free parameter, we can formally enforce this 
condition in our calculations. Corrections due to a deformation of the jet boundary would 
enter as power corrections in this region of phase space. The jet boundary therefore acts as 
a hard boundary of radius R, and the factorization theorem is identical to that presented 
in Ref. [76]. 


Factorization Theorem 


With an understanding of the precise restrictions on the power counting required for the 
validity of the soft subjet factorization theorem, we now discuss its structure. Since we have 
argued that the relevant factorization theorem is identical to that presented in Ref. [76], 
we will only state the result. The factorization theorem in the soft subjet region with the 
out-of-jet scale satisfying A <C Ej, and with jets defined by the anti -hr jet algorithm, 
is given by 


da(B ; R) 
de^de^de^ 


(3.17) 


J dB s dBj- j de^ n de J 3 sj defde^S(B - B Jn - Bs)5(e^ ] - e( n - - ef - 

x H nn (Q 2 )H^e^\ef)j n (e£") Jn(B Jn )S n nn sj (ef; B s ; r) J Usj (e^) 



R 


In this expression we have explicitly indicated the dependence on the jet boundaries with 
the jet radius R. A brief description of the functions appearing in Eq. (3.17) is as follows: 


• Hnn(Q 2 ) is the hard function describing the underlying short distance process. In 
this case we consider e + e - -» qq. 

• H^ n is the hard function describing the production of the soft subjet 

coherently from the initial qq dipole, and describes dynamics at the scale set by 
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• Jn is a J e ^ function at the scale e 3 a ^ describing the hard collinear modes of the 

identified jet along the n direction. 

• Jfi{B) is a jet function describing the collinear modes of the out-of-jet region of the 
event. 

• Snnn s:i ;( e 3 °^j B] Rj is the global soft function involving three Wilson line directions, 
n, n, n s j. The global soft function depends explicitly on both the out-of-jet measure¬ 
ment and the jet radius. 

• J ns . is a j e t function describing the dynamics of the soft subjet modes, which 

carry the bulk of the energy in the soft subjet. 

• S ngj n sj (e 3 , R ) is a soft function describing the dynamics of the boundary soft modes. 
It depends only on two Wilson line directions n s j,n s j. 

These functions, and a schematic depiction of the radiation which they define, are indi¬ 
cated in Fig. 7, along with a schematic depiction of the multistage matching procedure 
from QCD onto the effective theory, as described in detail in Ref. [76]. Although we will 
not discuss any details of the matching procedure, it is important to note that it occurs 
through a refactorization of the soft function, and hence the soft subjet factorization the¬ 
orem is sensitive to the global color structure of the event, since the soft subjet is emitted 
coherently from all eikonal lines. This should be contrasted with the case of the collinear 
subjets factorization theorem, where the matching occurs through a refactorization of the 
jet function. 

In the soft subjet region of phase space, we can relate the variables ef 1 '* , to the 
physically more transparent z s j,d s j variables with a simple Jacobian factor, giving the 
factorization theorem 

MB; fl) = (3.18) 

dz s j dO s j de 3 

J dB s dBj- I de J 3 "de J 3 sj de§de S 3 sj 6(B - B Jn - B s )5{e { f ' 1 - e 3 J " - - ef - ef'O 

x H n n(Q )H r J n ( Z s j , 9 s j ) J n ( e 3 n ) Jn(Bj n )S n nn s j (ef; Bs', R^Jn s - (e 3 S nsj n sj {e 3 J ; R) ■ 

Operator definitions, and one-loop calculations for the operators appearing in the 
factorization theorem of Eqs. (3.17) and (3.18) are given in App. C. 

3.1.3 Soft Haze 

The soft haze region defines the upper boundary of the e , e 3 °^ phase space. In this region 
of phase space jets consist of a single hard core, with no resolved subjets. A factorization 
theorem describing this region of phase space has not been presented elsewhere, but can be 
straightforwardly formulated in standard SCET involving only n and n collinear sectors. 

As discussed in Sec. 2.2, the power counting in the soft haze region depends sensitively 
on the relative values of a and /?, and therefore so does the structure of the factorization 
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Figure 8: A schematic depiction of the soft haze configuration where no subjets are 
resolved, with dominant QCD radiation and the functions describing its dynamics in the 
effective field theory is shown in a). The relevant scales in the effective field theory are 
shown in b), where we have restricted to the case a = (3 = 2 for simplicity. 


theorem. Since, from Eq. (2.10), we restrict ourself to a > /?, we will for simplicity only 
discuss the factorization theorems valid in this case. Factorization theorems for other values 
of a and j3 can be determined by performing a similar analysis. 


Mode Structure 

In the soft haze region the observables have the power counting 


■ { 2 * a) ~ + 0“, 

(3.19) 

'T ~ Zs + 

(3.20) 

: 3 ^ ~ z s + z s + > 

(3.21) 


where we have not yet dropped power suppressed terms. We are interested in the factor¬ 
ization theorem on the upper boundary, with the scaling ~ ( e 2 ^l - 4 We now assume 


4 There is another parametric choice for the relative scaling of the 2-point energy correlation functions 
[74], though it does not extend to the upper boundary of the phase space. If (e^)^ ~ (eS/ 3 ')™, then the 
power counting is 


» 

-2 


Zs+dc 

95 , 


(a) 2 i n a 

e 3 ~ Z 3 + 0 C Z S , 


with both 2-point correlation functions dominated by collinear physics. For a > f3, this region has the 


scaling 


( 4 ») 


2 a/p 


which does not extend to the upper boundary. 
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a > (5. In this case, dropping power suppressed terms, the appropriate power counting is 


(a) 

) ~ Z s , 

(3.22) 


(3.23) 

r ~ ^. 

(3.24) 


It is also interesting to consider the case a = j3 because in the soft haze region it is 
not necessary to measure two different 2 -point energy correlation functions, unlike in the 
two-prong region of phase space. In the case that a = j3, we have instead, 


4 a) ~ + e a c , (3.25) 

4 a) ~ 4 + > (3-26) 


where the second term in the expression for e 3 is no longer power suppressed. This will 
modify the factorization theorem between the two cases. 

In both cases, the scaling of the modes is then given by 



p s ~ ef ] Ej ( 1 , !, l ) nfi , 


(3.27) 

(3.28) 


with (5 = a in the second case. Here Ej is the energy of the jet and the subscripts denote 
the light-like directions with respect to which the momenta is decomposed. This scaling 
should be recognized as the usual power counting of the collinear and soft modes for the 
angularities with angular exponent /3 [74, 110]. 


Factorization Theorem 


The factorization theorem in the soft haze region of phase space can now be straightfor¬ 
wardly read off from the power counting expressions of the previous sections. We state it 
both for the case a = (3 and a>/3. For a > /?, we have 
da 


dei^ de{ a ^ 


= H nn (Q z )J n (B) 


de c ,de s o5 


-e c 2 - 


4 


Jn (4) Sn 


4A a) A a) ,R>B 


(3.29) 


where we have suppressed the convolution over the out-of-jet measurement B, to focus on 
the structure of the in-jet measurements. For a = /3, the factorization theorem takes an 
interesting form 5 

(jf 7 (a) = H n n{Q 2 )Jh{B) f de c 2 de s 2 de s 3 d (e { 2 ] -e c 2 - e|) 5 (e ( 3 a) - e c 2 e 2 s - e s 3 ) (3.30) 
de 2 de 3 ; ■> 5 ' 5 ' 

X Jn(e c 2 )S nfi (e| , e' 2 s , e s 3 , R, B^ , 

5 When calculating the tail of the D 2 distribution, one might be tempted to marginalize over e ) 55 in 
Eq. (3.29). This naive marginalization does not yield the correct result. Rather, if one started the derivation 
of the factorization theorem with only the measurements of and e 3 “' imposed, so that all possible e^ 
configurations are integrated over, then Eq. (3.30) would be obtained. Thus Eq. (3.30) is the correct 
marginalization over in Eq. (3.29). 
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where again the convolution over B has been suppressed. A brief description of the func¬ 
tions appearing in the factorization theorems is as follows: 

• H n n ( Q 2 ) is the hard function describing the underlying short distance process. In 
this case we consider e + e _ — > qq. 

• Jfi{B ) is the jet function describing the collinear modes for the recoiling jet. 

• Jn (e|) i s the jet function describing the collinear modes for the jet in the n direction. 

• S nn (e s 2 , e' 2 s , e s 3 , R, B^j and S , '[in ^2 , , R'i 7?^ are soft functions describing the 

global soft radiation from the nn dipole. In the case of a = /3, an additional energy 
correlation, e 2 , is measured on the soft radiation, with an angular factor of 2 a, 
which multiplies against the collinear contribution to e 2 when contributing to . 
These also carry the jet algorithm constraints denoted by R, and any out-of-jet 
measurements B. 

These functions, and a schematic depiction of the radiation which they define are indicated 
in Fig. 8. In App. F, we give operator definitions of these functions and the leading-power 
expression for the e^ measurement operator in the soft function. 

There are several interesting features about the factorization theorems of Eqs. (3.29) 
and (3.30). First, the soft functions are multi-differential, in that they require the simul¬ 
taneous measurement of multiple quantities. Such multi-differential jet and soft functions 
have been discussed in detail in Ref. [74, 75]. One other interesting feature of the factor¬ 
ization theorem of Eq. (3.30), for the case of equal angular exponents, is the appearance of 
the product structure in the ^-function defining the value of e^ a \ This product structure 
follows from the power counting of Eq. (3.25) which describes the properties of the 3-point 
energy correlation function in the soft and collinear limits. It is important to note that 
this product form does not violate soft-collinear factorization, since only the knowledge of 
the total e 2 ^ of the soft or collinear sector is required. 

The soft contribution to the 3-point energy correlation is first non-vanishing with 
two real emissions. Therefore at one-loop, the factorization theorem of Eq. (3.29) re¬ 
duces exactly to the factorization theorem for the multi-differential angularities studied in 
Refs. [74, 75], whereas the factorization theorem of Eq. (3.30) reduces to the factorization 
theorem for a single angularity. In this paper, we will not perform the two-loop calcula¬ 
tion necessary to obtain a non-trivial contribution to the three point energy correlation 
function. Instead, we will obtain an approximation to the cross section in this region by 
taking a limit of our factorization theorems in the two-prong region of phase space. This 
is possible, because as we will show in Sec. 4.3 by studying the fixed order distributions 
for the observable D 2 , there is no fixed order singularity in the soft haze region of phase 
space in the presence of a mass cut. This implies that the resummation is not needed to 
regulate a fixed order singularity. This will be discussed in Sec. 4.4.2. The field theoretic 
definitions of the functions appearing in the factorization theorem of Eq. (3.29) as well as 
power expansions of the measurement operators are collected in App. F. However, because 
of the fact that we do not explicitly use the results of the soft haze factorization theorem 


- 23 - 


in our calculation, we simply refer the reader to Refs. [74, 75] for the calculations of the 
one-loop functions relevant to the factorization theorems of Eqs. (3.29) and (3.30), and 
leave for future work the full two-loop calculation. 

3.1.4 Refactorization of the Global Soft Function 

In each of the factorization theorems required for the description of QCD background jets, 
namely the collinear subjets, soft subjet, and soft haze factorization theorems, there is a 
global soft function, which is sensitive to both the in-jet measurement of the energy correla¬ 
tion functions, as well as the out-of-jet measurement B. To ensure that all large logarithms 
are resummed by the renormalization group evolution, we must perform a refactorization of 
the soft function [60, 62, 110-112], This ensures that the only logarithms which appear in a 
given soft function that are sensitive to both in-jet and out-of-jet scales are true non-global 
logarithms (NGLs) [78], which first appear at two-loop order in the calculation of a par¬ 
ticular soft function. 6 7 Here we focus on the refactorization of the soft subjet and collinear 
subjets factorization theorems of Secs. 3.1.1 and 3.1.2, which will be used in our numerical 
calculation. For both of these factorization theorems, we can write the soft function to all 
orders in a s as 

5 (e^,B-R,^ = £(° ut ) (r ; R,/z) (e^:/?,, M ) <S NG l , (3.31) 

where we have explicitly indicated the renormalization scale /r dependence [113]. The non- 
global part of the soft function Sngl f4 *\B;FLj is first non-trivial at two-loop order, 
beyond the accuracy to which we explicitly calculated the soft functions in this paper. 
Furthermore, the anomalous dimension of the soft function factorizes to all orders in per¬ 
turbation theory as 

75 (4 q) , B- R- Ai) = 7 ^ out) (R; R: + 7 f } (4°° ; R; /z) , (3.32) 

and therefore the renormalization group kernels factorize as well. Briefly, this occurs be¬ 
cause renormalization group consistency relates the soft anomalous dimension to the sum 
of all the other anomalous dimensions, each of which can be associated with the in-jet or 
out-of-jet contributions. 

While similar refactorizations of the global soft function have been discussed previously, 
and used in numerical calculations (see especially Ref. [62] for a detailed discussion), we 
will discuss it here for completeness. The refactorization of the global soft function plays 
a role in our numerical results and is particularly important in appropriately separating 

6 It is important to emphasize that throughout this section we refer to the NGLs which appear in the soft 
function of a given factorization theorem, and the order in a B at which they will appear in this particular 
soft function. Because we combine distinct factorization theorems, some of which include hard splitting 
functions, or eikonal emission functions, this order is in general distinct from the order at which they will 
appear in the total cross section, which can be different for each factorization theorem. This combination 
of the factorization theorems is completely independent from the refactorization of the soft function in a 
particular factorization theorem. 

7 As discussed in Ref. [ 62 ] there is some ambiguity in how the hard function, for example, is associated 
with the in-jet or out-of-jet anomalous dimensions, but this does not affect the above argument. 
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scales in the global soft function of the soft subjet factorization theorem of Sec. 3.1.2. In 
Ref. [76] the structure of the one-loop calculation of the soft subjet factorization theorem 
was discussed in detail, with a particular focus on the dependence on the angle A 6 s j 
between the soft subjet and the boundary. There it was found that the while the out-of-jet 
soft function contained dependence on the angle between the soft subjet and the boundary, 
A9 s j. this dependence vanishes in the in-jet contribution to the soft function due to a zero 
bin subtraction. Renormalization group consistency is achieved since the A 9 s j dependence 
associated with the in-jet region is carried by the boundary soft function. Therefore, the 
refactorization of the global soft function for the soft subjet factorization theorem allows 
the soft function to be separated into a piece with A 8 s j dependence, and a piece with no 
A 9 s j dependence, and is crucial for resumming all large logarithms associated with this 
scale. The one-loop anomalous dimensions, split into out-of-jet and in-jet contributions, as 
well as canonical scales for both the in-jet and out-of-jet soft functions are given in App. B, 
App. C, and App. D. Further details of this refactorization, and in particular a discussion 
on the dependence on A 9 s j is also given. 

For completeness, we also give the final refactorized expressions for the factorization 
theorems for the collinear subjets and soft subjet factorization theorems that will be used 
when presenting numerical results. For the collinear subjets factorization theorem, we have 


d 3 


a 


dzde\ de 




H*-(0 2 )P f ^ fafb 

/ j JJ miW / 1 nt^n a ,r 
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dB s dBj- / de c 3 delde s 3 def (3.33) 
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while for the soft subjet factorization theorem, we have 


da(B ; R) 
dz s j d6 s j de 3 


= H nfl {Q 2 )H4{z S3 ,e sj ) J dB s dB Jn j de' 3 n de' 3 a:i de 3 de 3 s:i 


X 5(B - B Jn - B s )5(e ( 4 - e J 3 " - e J « - e| - e^) 
x J n (e 3 J ") MBjJS^ (b s : r) (ef: /?,) J nsj (e 3 J -) 


(3.34) 




In this form, each function in Eqs. (3.33) and (3.34) contains logarithms of a single scale, 
which can be resummed through renormalization group evolution. 


3.2 Boosted Boson Signal 

In this section we discuss the effective field theory and factorization theorem relevant for 
the hadronically-decaying boosted boson signal. For concreteness, we will consider the 
case of a boosted Z boson decaying to a massless qq pair; however, the extension to other 
color-neutral boosted particles is trivial. We will work in the narrow width approximation, 
setting the width of the Z boson Yz = 0. Corrections to this approximation are trivial to 
implement, as they do not modify the structure of the factorization, and are expected to 
have a minimal effect. 
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Boosted Boson Signal 

4“ ) «(e<“ ) ) 3 \ 



(a) 


Match To 



(b) 


Figure 9: A schematic depiction of the boosted Z boson configuration with dominant 
QCD radiation and the functions describing its dynamics in the effective held theory is 
shown in a). The relevant scales, ordered in virtuality, are summarized in b), where we 
have restricted to the case a = (3 = 2 for simplicity. 

A factorization theorem for the iV-subjettiness observable [63, 64, 95] measured 
on boosted Z jets was presented in Ref. [41]. This factorization theorem was obtained by 
boosting an appropriately chosen e + e~ event shape. A factorization theorem can also be 
formulated using the SCET + effective theory, 8 where the collinear-soft mode, which was 
described in Sec. 3.1.1, corresponds to the boosted soft mode of the e + e _ event shape. 
We will take this second approach, as it is in line with the general spirit of this paper, of 
developing effective field theory descriptions of jet substructure configurations. However, 
the approach of relating to boosted e + e - event shape variables is useful for relating results 
to higher order calculations known in the literature. Despite the fact that the factorization 
for the energy correlation functions in the signal region follows straightforwardly from that 
of Ref. [41], or from the SCET+ factorization theorem of Sec. 3.1.1, we will discuss it here 
for completeness. 

We assume the process e + e _ —> ZZ —> qqll, where l is a lepton to avoid having to 
describe additional jets, although the extension to two hadronically-decaying Z bosons is 
trivial. The factorization theorem is then similar to that presented in Sec. 3.1.1, however, 
there are no global soft modes since the Z is a color singlet. The scaling of the collinear 
and collinear-soft modes are identical to those given in Sec. 3.1.1, so we do not repeat them 

8 Here we have slightly extended the usage of the SCET+ nomenclature beyond that which it was origi¬ 
nally used in Ref. [77]. In particular, in the case of the signal distribution, there are no global soft modes, 
and the matching to the effective theory proceeds in quite a different way than for the case of a two 
prong QCD jet as originally considered in Ref. [77]. Nevertheless, because the effective theory contains a 
collinear-soft mode, we will refer to it as SCET + . 
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here. The factorization theorem is given by 



(3.35) 


As with the factorization theorem in Sec. 3.1.1, we have chosen to write the factorization 
theorem in terms of , e^, and the energy fraction of one of the subjets, z. A brief 
description of the functions appearing in Eq. (3.35) is as follows: 


• H(Q 2 ) is the hard function describing the production of the on-shell Z bosons in an 
e + e _ collision. It also includes the leptonic decay of the Z boson. Following Ref. [41] 
we assume that the Z boson is unpolarized and so its decay matrix element is flat in 
the cosine of the boost angle. Non-flat distributions corresponding to some particular 
decay or production mechanism are straighforward to include. 

• Pn->na,n b describes the decay of the on-shell Z boson into a qq pair with 

momenta along the n a and n& axes. 

• Jn a (^z;e^j, Jn b (^ 1 — are the jet functions describing the collinear radiation 

associated with the two collinear subjets. 

• S+ a Je “ ^ is the collinear-soft function describing the radiation from the qq dipole 
formed by the two collinear subjets. 


The basic structure of the factorization theorem, and the radiation described by the dif¬ 
ferent functions, as well as their scalings, are shown schematically in Fig. 9. Operator 
definitions, and one-loop calculations for the operators appearing in the factorization the¬ 
orem of Eq. (3.35) are given in App. E. Because the collinear soft modes are boosted, the 
collinear soft function does not require a refactorization, as was necessary for the global 
soft functions, in Sec. 3.1.4. 

It is important to emphasize the distinction between our treatment of a boosted Z jet, 
where we presented a single factorization theorem, and a massive QCD jet, where three 
distinct factorization theorems were required. While it is obvious that the soft haze region 
does not exist for a boosted Z jet, the soft subjet region does. However, unlike the case of 
a massive QCD jet, where the soft subjet region is enhanced by a factor of 1 / z S] from the 
eikonal emission factor, no such enhancement exists for the Z decay. Indeed, it was shown 
in Ref. [41] that the effect of the jet boundary, which would arise from the soft subjet 
configuration, is power suppressed by 1 /Q. While it would be potentially interesting to 
analytically study the jet radius dependence for the signal distribution using the soft subjet 
factorization theorem, this is beyond the scope of this paper. We will therefore neglect jet 
radius effects and write the factorization theorem in Eq. (3.35) with no R dependence. 

The factorization theorem of Eq. (3.35) provides an accurate description of the boosted 

boson signal in the two-prong region of phase space, where -C (^ 2 ^) • However, to be 
able to compare the signal and background distributions, a valid description of the region 
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e 3 *) > (e<“>) is also required. Unlike for the case of a massive QCD jet, where this region 
is described by the soft haze factorization theorem, for a boosted Z boson, an accurate 
description of this region requires matching to the fixed order Z —> qqg matrix element. 
Since the boost of the Z boson is fixed, this corresponds to a hard gluon emission from 
the qq dipole. In the numerical results shown throughout the paper, we have performed 
this matching to fixed order, directly within the SCET + effective theory. The fixed order 
cross section for onto which the result of the factorization theorem was matched, 

was calculated numerically by boosting the leading order e + e _ —> qqg matrix element and 
performing a Monte Carlo integration. This allows for the consideration of general angular 
exponents a and /3 in which case the required integrals are difficult, if not impossible, to 
evaluate analytically. 


4 A Factorization Friendly Two-Prong Discriminant 

The approach to two-prong discrimination taken in this paper is to use calculability and 
factorizability constraints to guide the construction of an observable. Having understood in 
detail the structure of the e^ phase space, along with the effective field theories 

describing each parametric region, we now show how a powerful two-prong discriminant, 
D 2 , emerges from this analysis naturally. After defining the D 2 observable, we discuss 
some of its interesting properties, and show that the factorization theorems of Sec. 3 can 
be combined to give a factorized description of the observable over the entire phase space. 


4.1 Defining D 2 

The goal of boosted boson discrimination is to define observables which distinguish between 
one- and two-prong jets. As a simplification, we will take the view that both collinear 
and soft subjets should be treated as two-pronged by the discriminant, while soft haze 
jets should be treated as one-pronged. Treating both the collinear and soft subjets as 
two-pronged immediately implies that a marginalization over the soft subjet and collinear 
subjet factorization theorems will need to be performed to obtain a prediction for the two- 
prong discriminant. This will be discussed in Sec. 4.4. A more sophisticated observable 
could take advantage of the different fraction of signal and QCD jets in the soft subjet 
and collinear subjets regions of phase space, and we will give a simple example of such an 
observable in Sec. 5.7. 

We will consider discriminants, which we denote D^ a ’^\ which parametrize a family 
of contours in the e^\ plane, as shown schematically in Fig. 10. Such observables can 
be calculated by marginalizing the double differential cross section [70] 


da 


dD 




de^de^d (D^ P) - ^ (a) • (4.1) 

v ' de\ de\ 


For the observable to be calculable using the factorization theorems of Sec. 3, the 

curves over which the marginalization is performed in Eq. (4.1) must lie entirely in a region 
of phase space in which there is a description in terms of a single effective field theory (up to 
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Figure 10: a) Contours of the observable D 2 in the e,p , plane, b) Sample D 2 spectra 

for boosted Z bosons and QCD jets, generated in Monte Carlo. Angular exponents a = 
/3 = 2 have been used. 


the marginalization over the collinear and soft subjets). Stated another way, the contours 
of D 2 °’^ must lie either entirely in the one-prong region of phase space, or entirely in 
the two-prong region of phase space. This condition is also natural from the perspective 
that provide good discrimination power, a point which has been emphasized in 

Refs. [66, 67]. If the contours do not respect the parametric scalings of the phase space, 
the marginalization cannot be performed within a single effective field theory. A more 
sophisticated interpolation between the different effective field theories, along the lines of 
Refs. [74, 75] is then required. 

In Sec. 2, a power counting analysis was used to show that for 3a//3 > 2, the one- and 
two-prong regions of phase space are parametrically separated, with the contour separating 

them scaling as eg ~ ( el) I . This implies that, parametrically, the optimal two-prong 

discriminant formed from and is 


U 2 


S a ) 




(4.2) 


This extends the definition of Ref. [66], which considered the observable D^' a \ with equal 
angular exponents. To simplify our notation, we will often not explicitly write the angular 
exponents a and /3, referring to the observable simply as I? 2 - 

The D 2 observable takes small values for a two-prong jet and large values for a one- 
prong jet. Its contours in the e^\e^ phase space are shown schematically in Fig. 10, 
along with illustrative Monte Carlo generated spectra for both boosted Z jets and massive 
QCD jets in e + e _ collisions. A more detailed discussion of the discrimination power of D 2 , 
as well as the details of the Monte Carlo generation, will be given in Sec. 5. 
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4.2 Sudakov Safety of D 2 


One interesting feature of the D 2 observable is that it is not IRC safe without an explicit 
cut on e^\ For every value of D 2 , the contour over which the double differential cross 
section is marginalized passes through the origin of the phase space, where the soft and 
collinear singularities are located. This feature is shown in Fig. 10a. At every fixed order 
in perturbation theory, this gives rise to an ill-defined (divergent) cross section. However, a 
resummed calculation of the double differential cross section regularizes the singular region 
of phase space, and leads to a finite distribution for the D 2 observable. This property is 
referred to as Sudakov safety [70, 73] . Because Sudakov safe observables are not calculable 
in fixed order perturbation theory, they do not generically have an a s expansion, and we 
will show that the D 2 spectrum exhibits a particularly interesting dependence on a s . 

The regularization of the fixed order singularity in the double differential cross section 
is achieved by the all orders resummation of logarithmically enhanced terms in the pertur¬ 
bative expansion. In the effective field theory description, this resummation is achieved by 
renormalization group evolution, and its properties are therefore determined by the form 
of the SCET anomalous dimensions. To illustrate how the a s dependence arises from the 
structure of the renormalization group evolution in SCET, we consider the soft subjet fac¬ 
torization theorem of Sec. 3.1.2 in the leading logarithmic (LL) approximation. The cusp 
pieces of the anomalous dimensions for the different functions appearing in the factorization 
are given in Laplace space by (see App. C) 
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where we have used to denote the Laplace conjugate to , and we have kept only 
IR scales in the logs. Furthermore, we have kept only the terms proportional to C A so as 
to resum only the physics associated with the soft subjet. The hard matching coefficient 
for the soft subjet production is given by the tree level eikonal emission factor 
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Solving the renormalization group equations, and running all functions to the hard scale 
Q, we then find that in the soft subjet region of phase space the multi-differential cross 
section can be written to LL accuracy as 
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exhibiting a familiar Sudakov form. 

A complete calculation of the D 2 spectrum requires marginalizing over both the soft 
subjet and collinear subjet configurations, which we discuss in Sec. 4.4. However, to demon¬ 
strate the a s behavior in the simplest manner, we will consider just the soft subjet effective 
theory. In particular, we will fix the angle of the soft subjet, but allow it to be arbitrarily 
soft, so as to probe the singular region of phase space. The result is then representative of 
the contribution from the soft subjet region of phase space. An exactly analogous behavior 
occurs for the contribution from the collinear subjets region of phase space. 

Fixing 6 s j to satisfy n ■ n s j = 1/2 (and therefore n ■ n s j = 3/2), and restricting to 
a = !3 for simplicity, the 2-point energy correlation function in the soft subjet region of 
phase space is simply 


(4.9) 


The corresponding D 2 distribution is then obtained by marginalizing the multi-differential 
cross section of Eq. (4.8) 
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where, in the second line, we have fixed 0 s j and so we do not integrate over it. Inserting 
the multi-differential cross section and fixing 9 s j, we then have 
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where the sj superscript denotes that this is representative of a contribution from the soft 
subjet region of phase space. Importantly, because the soft subjet is defined by require¬ 
ments on IRC safe measurements, the cross section in Eq. (4.11) is a well-defined and in 
principle measurable quantity. 

The a s dependence in this distribution of D 2 is very surprising. Because D 2 is defined 
with respect to the 3-point energy correlation function, one would naively expect that D 2 
only makes sense for a jet with at least three partons. Indeed, if we make an explicit cut 
on z s j, for example, then D 2 is IRC safe, and first non-zero for a jet with three partons 
at O(a^). However, because D 2 without a cut on z s j is not IRC safe, this intuition fails, 
and in a fascinating way. By resumming the large logarithms of z s j to all orders and 
then marginalizing, the D 2 distribution calculated in Eq. (4.11) actually starts at 0(a s )\ 
Including emissions to all orders has effectively generated a non-trivial distribution for D 2 
at one order lower in a s than when it is first, naively, non-zero. Other examples of Sudakov 
safe observables in the literature have expansions in y / a J [70, 73] or are even independent of 
a s [71-73]. To our knowledge, D 2 is the first example of a Sudakov safe observable for which 
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Figure 11: a) A schematic depiction of the , eg phase space in the presence of a 
mass cut, along with contours of the D 2 observable, b) Leading order (through oi 2 s ) and 
next-to-leading order (through a^) distributions for the D 2 observable in the presence of a 
mass cut as measured on hemisphere jets in e + e _ collisions. 


all-orders resummation reduces the order in a s when the observable’s distribution is first 
non-zero. 9 We re-emphasize that though the distribution of D 2 in Eq. (4.11) is a Taylor 
series in a s , it is impossible in purely fixed-order perturbation theory to systematically 
calculate it. 

4.3 Fixed-Order D 2 Distributions with a Mass Cut 

Although D 2 is not IRC safe without a cut on e^\ leading to its interesting Sudakov 
safe behavior, in experimental analyses a jet mass cut will be always be applied. We will 
therefore be most interested in this case. In Fig. 11a we show a schematic depiction of the 
e 2 ^, e 3 ^ phase space in the presence of a mass cut for a = j3 = 2, along with contours 
of the D 2 observable. As is indicated in the figure, the mass cut removes the origin of 
the phase space, making D 2 IRC safe and calculable in fixed-order perturbation theory. 
It is therefore interesting to study the singularity structure of the fixed-order perturbative 
expansion of D 2 in the presence of a mass cut. 

In Fig. lib we show both the leading order (a 2 ) (LO) and the next-to-leading order 
(aj) (NLO) fixed-order distributions of the D observable as measured on the most 
energetic hemisphere jet in e + e _ —> dijets events at 1 TeV center of mass energy, and 
with a jet mass cut of mj 6 [80,100] GeV, in anticipation of our application to boosted 
Z boson discrimination. However, the detailed range of the mass cut window is irrelevant 
to the arguments of this section. NLOJet+-|- [114-118] was used to generate the dis- 

9 For observables that do not have universal behavior in the ultraviolet [73]. 
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tributions. The fixed-order D 2 distribution diverges at small values, and its sign in this 
region flips order-by-order, characteristic of the Sudakov region. This behavior makes clear 
the necessity of resununation in the small Do region. However, importantly, there is no 
divergence or other structure at large values of D 2 . Instead, the distribution exhibits a tail 
extending to large values both at LO and NLO, and this behavior is expected to persist 
to higher orders. This long tail arises from the fact that the upper boundary of the phase 

space is parametrically far, of distance ~ l/ei a \ from the two-prong region of phase space. 

(2) (2) 

A schematic depiction of the singularity structure in the phase space is shown 

in Fig. 11a. The observation that a fixed-order singularity exists only at small values of 
Z ?2 is important for the resununation of the observable in the presence of a mass cut. In 
particular, while resununation in the soft subjet and collinear subjet factorization theo¬ 
rems are necessary to regulate a fixed-order singularity, the soft haze factorization theorem 
presented in Sec. 3.1.3 is not. 

The fixed-order behavior of the D 2 observable is in some ways much more similar to 
that of a traditional jet or event shape than might naively be expected. However, there 
are some important differences. In particular, a mass cut of 80 < mj < 100 GeV has been 
applied, which is comparable to the location of the Sudakov peak in the mass for a jet of 
energy 500 GeV. Therefore, unlike in the case of a traditional jet shape, where there is a 
transition from a region where resummation is important to a far tail region where a fixed 
order calculation provides an accurate description, in this case, for all values of D 2 , there is 
an overall Sudakov suppression due to the mass cut, in addition to the divergence at small 
values of Do. This is however, a small effect in the fixed order distribution compared to 
the divergence at smaller values, and most importantly, does not require regularization, as 
it is regulated by the mass cut. 

4.4 Merging Factorization Theorems 

A complete description of the D 2 observable for background jets requires combining the 
three factorization theorems presented in Sec. 3. This involves both the merging of the 
soft subjet and collinear subjets factorization theorems, which must be performed before 
the marginalization over the D 2 contours, as well as the matching between the small D 2 
description of the resolved two-prong region and large D 2 description of the unresolved 
region. We will discuss how the matching is accomplished for these two cases in turn. 

4.4.1 Merging Soft and Collinear Subjets 

The region of phase space in which two subjets are resolved by the measurement is described 
by two distinct factorization theorems. These two regions of phase space are separated by 
the measurement of the two 2-point energy correlation functions, . However, in 

the calculation of D 2 , both regions are treated as two-pronged, and the additional 2-point 
energy correlation function must be marginalized over. Since each effective theory can only 
be used within its regime of validity, a merged description, valid in both the soft subjets 
and collinear subjets region of phase space, is required. To accomplish this, we introduce 
a novel procedure for merging the two factorization theorems. 
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At a fixed e^°\ the soft subjet and collinear subjets fill out the \ df 1 phase space, 
which was shown in Fig. 5c. This phase space has also been studied in the context of two 
angularities measured on a single jet in Refs. [74, 75]. In this case factorization theorems 
involving only collinear and soft modes exist on the boundaries of phase space, and an 
additional collinear-soft mode is required in the bulk of phase space. New logarithms exist 
in the bulk of the phase space, so called kx logarithms [74], which can either be captured by 
the additional collinear-soft mode proposed in Ref. [75], or by the interpolation procedure 
of Ref. [74], In this case, the factorization theorems involving only the collinear and soft 
modes do not extend beyond the boundaries of the phase space, and they cannot be directly 
matched onto one another, as this would neglect the resummation of the kx logarithms, 
which are not present in either factorization theorem. We will now argue that the case of 
interest in this paper, namely of two resolved subjets, is different. In particular, the soft 
subjet and collinear subjets factorization theorems extend from the boundaries of phase 
space, and already contain all the modes required for a description in the bulk of the phase 
space. In particular no additional modes exist in the bulk region of the phase space. This 
implies in particular that a description of the entire phase space region can be obtained by 
a proper merging of the collinear subjets and soft subjet factorization theorems, which is 
the approach that we will take. 

To see that no additional modes are present in the bulk of the phase space, it is 
sufficient to look for modes which transition between the modes present in the effective 
theory descriptions in the soft subjet and collinear subjets regions of phase space, and which 
contribute at leading power. When transitioning from the collinear subjets region of phase 
space to the soft subjet region of phase space, as is shown schematically in Fig. 12a, the 
collinear modes of one of the jets become the soft subjet and boundary soft modes of the soft 
subjet factorization theorem. On the other hand, the collinear-soft modes transition to the 
global soft modes. However, one could possibly be concerned that there exist additional 
modes which appear as collinear-soft modes on the boundary of phase space where the 
collinear subjets exist, but which transition to soft subjet modes instead of global soft 
modes. However, one can immediately see that such modes cannot exist, since the energy 
fraction of the soft subjet modes is set by the measurement, while the energy fraction of 
the collinear-soft modes is set by the e 3 measurement. Since e 3 is fixed, and the transition is 
occurring only in the e^' 1 , fi-p phase space, such modes cannot exist. This implies that all 
contributing modes already exist in either the soft subjet, or collinear subjets factorization 
theorems. This is a crucial difference from the case of the double differential angularities, 
which in some sense simplifies the analysis. Since no additional modes exist in the bulk of 
the phase space, the factorization theorems can be extended from the boundaries, and can 
be matched onto each other. This will allow for the resummation of all large logarithms. 
We will now discuss in more detail our implementation of this matching, after which we 
will see that our argument, presented here based on power counting, for the absence of 
additional modes, is explicitly realized through our merging procedure. 

This suggests then the procedure we will use for interpolating between the collinear 
subjets and soft subjet factorization theorem, as sketched in Ref. [76], where the soft subjet 
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Figure 12: a) A schematic depiction of the transition between the soft subjet and collinear 
subjets regions of phase space, b) Distribution of the energy fraction of the gluon subjet 
as predicted by the collinear subjets effective theory, the soft subjet effective theory, and 
the merged description. The collinear zero bin of the soft subjet is also shown. 

factorization theorem was originally introduced. It proceeds by implementing a zero bin 
subtraction [119] in factorization theorem space (the meaning of this will become clear 
shortly) to remove double counting in the overlapping region between the effective theories. 
This is a non-trivial and novel example of the zero bin procedure, and demonstrates the 
general utility of its approach. 

Recall that in a standard SCET factorization, the cross section is written as a convo¬ 
lution of a jet function, which describes the collinear physics, and a soft function, which 
describes the soft physics. To achieve this mode separation without introducing a double 
counting, the soft limit of the jet function must be subtracted, which is referred to in the 
literature as a zero bin subtraction. Here we extend this approach to the case of two distinct 
factorization theorems which describe different regions of a multi-differential phase space, 
the soft subjet and collinear subjets effective field theories, but which overlap in the bulk of 
the two-prong phase space. It is important that here we only focus on the two-prong region 
of phase space; the matching to the one-prong region of phase space will be discussed in 
Sec. 4.4.2. To perform the matching in the two-prong region of phase space, inspired by 
the zero-bin procedure, we will write the cross section as a sum of the contributions from 
the soft subjet factorization theorem and the collinear subjets factorization theorem, with 
a zero bin contribution to remove the overlap between the effective theories. Explicitly, we 
write 


<7 — {&sj ^V;|cs) T &cs , (4.12) 

where we have suppressed that at this stage the cross section is still differential in the 
kinematics of the subjets, so that our notation is not overly cumbersome. The cross 
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section in the soft subjet or collinear subjets regions of phase space are denoted by sj 
and cs subscripts, respectively. Here the zero bin contribution, which removes the double 
counting, is given by cr s j\ cs . Explicitly, cr s j\ cs is obtained by taking the limit of the soft 
subjet factorization theorem in the power counting of the collinear subjets factorization 
theorem. The anomalous dimensions and one-loop matrix elements for the collinear zero bin 
of the soft subjet factorization theorem are given in App. D. Each of the three contributions 
to the cross section given in Eq. (4.12) are associated with their own factorization theorem. 
However, the contributions to the cross section with the clearest physical interpretation 
are a cs and the combined term (a s j — cr s j\ cs ), which we will refer to the as the zero bin 
subtracted soft subjet contribution. It is the contribution which can be interpreted over 
the entire phase space as the contribution from a soft subjet, and all logarithms contained 
in this expression are of soft scales. 

We specifically subtract the collinear-bin of the soft subjet factorization, and not the 
soft-bin of the collinear factorization. This is due to the need to cancel the contributions 
from the boundary soft modes of the soft subjet factorization in the collinear region. Since 
no analogous mode to the boundary softs is found in the collinear resummation, any soft 
expansion would miss this contribution, resulting in a logarithm being resummed in an 
inappropriate collinear region of phase space. This is in contrast to what happens when 
comparing the two subtractions in the soft region. So long as one uses the relative transverse 
momentum of the sub jets as the splitting scale of the collinear factorization, the collinear- 
bin of the soft subjet does match the soft-bin of the collinear factorization in the soft 
region. This is the result of the merging of various soft scales. In the soft jet collinear-bin, 
the expanded boundary softs and global soft scales naturally merge, and in the soft-bin 
of the collinear jets, the global softs and collinear-softs also naturally merge in the soft 
region. This can be explicitly verified with the canonical scales given in App. G. Thus the 
collinear-bin of the soft subjet is the appropriate subtraction throughout phase space, to 
remove double counting at all points. 

Having defined our merging procedure, implemented through the zero bin, we can 
now revisit our argument for the absence of additional modes, previously given by power 
counting, which can be verified from an explicit calculation. Taking the collinear-bin of 
the soft subjet factorization, and the soft-bin of the collinear subjet factorization, one 
finds identical fixed order expressions, as well as a one-to-one mapping of the anomolous 
dimensions between these two re-expanded factorizations. With the merging of the soft 
scales in the “bins” of the primary factorizations as one enters the soft region then implies 
they are numerically equivalent. No new logarithms appear in the bulk of phase space, 
unlike the case of two angularities [74]. This emphasizes that the collinear-soft region is 
a genuine overlap between the factorizations, with no new structures not already found in 
the factorizations. 

To see visually the effect that this matching has, it is interesting to look at the distri¬ 
bution of the energy fraction of the one of the subjets. In Fig. 12b, we plot the distribution 
of the gluon subjet’s energy fraction as computed in the collinear subjets and soft subjet 
factorization theorems, as well as the energy spectrum for the matched cross section of 
Eq. (4.12) and zero bin contribution. The energy spectrum is cumulative D 2 < 2, which 
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Figure 13: a) Distribution of the energy fraction of the gluon subjet as predicted by the 
collinear subjets effective theory, the soft subjet effective theory, the collinear zero bin, and 
the matched description. A zoomed version at small z is shown in b). 


is the majority of the two-prong region, and for simplicity we have fixed the jet mass 
mj = mz- The matched contribution smoothly interpolates between the spectrum for the 
collinear subjets at large values of z, where the collinear subjets factorization theorem is 
valid, and captures all logarithms of the splitting angle, and that for the soft subjet fac¬ 
torization theorem at small values of z, accurately resumming large logarithms of z. It is 
also important to note that for large z, the zero bin contribution matches exactly onto the 
soft subjet contribution, removing its contribution in this region. One can also see that 
the collinear-bin of the soft subjets cancels the collinear contribution to the soft region, up 
to power corrections, as argued above. We find that the collinear subjets provides a good 
description over a large range of values, with the soft subjet factorization theorem only 
required at small values of 2 . 

In Fig. 13a, we show the energy spectra at cumulative D 2 < 0.6, along with a zoomed 
version at small values of z, in Fig. 13b. This figures makes clear that our matched 
prediction, computed using our zero-bin approach, reproduces correctly the behavior of 
the collinear subjets at large values of z, and the soft subjet factorization theorem at small 
values of z. In particular, in Fig. 13b, we see that below z ~ 0.05, the soft subjet and 
matched predictions are indistinguishable. 

Although we will not study this case explicitly in this paper, we have also performed 
the matching for gluon jets, where the dominant contribution comes from g —>• gg splitting. 
This case is somewhat interesting due to the fact that the Bose symmetry of the final 
gluons guarantees that the z distribution is symmetric about z = 0.5, leading to peaks in 
the z distribution due to soft singularities at both z = 0 and z = 1. Nevertheless, the same 
matching procedure works identically in this case, and this procedure could therefore also 
be straightforwardly applied for studying substructure in gluon jets, as would be required 
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for a complete calculation at the LHC. 

We have shown here the matched subjet energy spectra for the particular choice of jet 
radius R = 1 at a center of mass energy of 1 TeV for quark jets, as this is the particular 
case that we will focus on throughout the rest of the paper. However, we have investigated 
the properties of the matching away from these parameters. It is important to note that 
our procedure for merging factorization theorem must be carefully treated at small R. This 
manifests itself as a breakdown in the zero bin procedure. In particular, for a fixed value 
of e^*\ if R is small, then the power counting ~ z s j is invalidated. In other words, for 
small R there does not exist a region of phase space which contributes to for which z s j 
is sufficiently small that the soft subjet expansion is valid. 

We can bound the specific R that eliminates the soft subjet region by considering the 
minimum energy fraction accessible to a subjet at a fixed e^: 




^min 


(2sinfr 

As a necessary condition for a soft subjet, one must fulfill the condition: 
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(4.14) 


and so R ~ 1 for the soft subjet to contribute. To eliminate the soft subjet then requires 
R <C 1 and to still have valid collinear subjet regions requires that R and are related 
as: 


l»iT>4 a) . (4.15) 

Finally, one should distinguish a fixed mass jet from a fixed • In the case a = 2, since 
by varying Ej or R, we can open or close the soft subjet region. 

This appears in the zero bin by the fact that the zero bin subtraction is greater in all 
regions than the soft subjet, leading to a negative total cross section. We find numerically 
that this occurs for R < 0.5 for the case of mj = 90 GeV, and Q = 1 TeV. This value 
depends fairly sensitively on mj and Q, or equivalently In this case, only the collinear 
subjets factorization theorem should be used, and it is valid throughout the entire available 
phase space. In this paper we focus primarily on the case of fat jets, defined with R = 1, 
and therefore it is necessary to perform the matching between the soft sub jet region and 
the collinear subjets region for jets of energy 500 GeV. However, in Sec. 5.3, we perform 
a brief survey of different R values, comparing our analytic predictions with distributions 
from Monte Carlo generators. A more phenomenological study of the importance of the 
matching for different physics processes of interest for an e + e _ collider, the LHC, or even a 
possible 100 TeV collider, where even higher boosts can be achieved, would be interesting, 
but is well beyond the scope of our initial investigation and can be straightforwardly treated 
using our techniques. 

While we have used a zero bin procedure to perform the matching between the collinear 
subjets and soft subjet factorization theorems, it is also possible to develop a dedicated 
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effective field theory valid when the soft subjet becomes collinear. This effective field theory 
is related to our zero bin contribution, and has been developed in Refs. [120]. While we 
believe that this approach is nice in principle, for the observable D 2 , we find that such an 
effective held theory has a vanishing region of validity, as can be seen from the zero bin 
contribution in Fig. 12b, and Figs. 13a and 13b. We therefore believe that our use of the 
zero bin, as generalized to distinct factorization theorems, represents a natural approach 
to the merging of the distinct factorization theorems. However, we acknowledge that this 
is an observable dependent statement, and there may be cases where there is a sufficiently 
large region of overlap between the soft subjet and collinear subjets effective theories, and 
in this case it might prove useful to have a separate effective held theory description which 
is valid in the case that the soft subjet becomes collinear. 

4.4.2 Matching Resolved to Unresolved Subjets 

An important feature of the D 2 observable is that its contours respect the parametric 
scaling of the phase space, as emphasized in Fig. 10. This implies that the marginalization 
over the contours dehning the observable can be performed at small D 2 entirely within the 
merged effective theory of Sec. 4.4.1, and at large D 2 within the soft haze effective held 
theory. Hence the matching between these two different descriptions can be performed 
at the level of the D 2 distribution instead of at the level of the double differential cross 
section, which is a great simplification, and primary feature of the D 2 observable. 

The soft haze factorization theorem presented in Sec. 3.1.3 hrst contributes to the 
shape of the D 2 distribution at two emissions, the hrst order at which e 3 can be non-zero 
(technically at next-to-next-to-leading logarithmic prime order, NNLL', in the logarithmic 
counting). Since our focus is on an initial investigation of the factorization properties of 
two-prong discriminants, the necessary two-loop calculation is beyond the scope of this 
paper. Naively, this implies that since the merged effective held theory describing the 
two-prong region of phase space is only valid for D 2 < 1 , our predictions should not be 
extended beyond D 2 < 1. However, we will argue that because of the structure of hxed 
order singularities for the D 2 observable, extending our two-prong factorization theorems 
to large D 2 will provide an accurate description of the D 2 distribution for a wide range of 
Ej and R. 

As shown in Sec. 4.3, there does not exist a hxed order singularity at large D 2 . In 
particular, this implies that if extended into this region, the factorization theorems valid at 
small D 2 will not diverge. Furthermore, one in fact expects that they provide a reasonable 
description of the shape. They contain both an overall Sudakov factor for the - scale of 
the jet, and also provide a description of the internal structure of the jet in terms of splitting 
functions (in the case of the collinear subjets factorization). While the splitting function 
does not exactly reproduce the matrix elements in the soft haze factorization theorem, it 
provides a good description of them. We believe that this is a consistent approach which 
suffices for this initial investigation. 

Perhaps the most important fixed order correction not captured in the subjet factori- 
tion for D 2 is simply the endpoint of the distribution, which arises from the kinematic 
boundaries of the phase space. Since we will normalize our distributions to 1 , in order to 
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compare to the Monte Carlo generators, the height of the peak is correlated with the end¬ 
point. Matching to the soft haze region would give the resummed distribution the correct 
endpoint in the tail, and thus can shift the peak up in general. This endpoint is sensitive 
to the specific R and Ej of the jet, as well as to the values of the angular exponents a 
and (5. Recall that since the Monte Carlo generators respect momentum conservation, 
they always terminate their distributions before the physical endpoint of the spectrum. 
We will also see how this disagreement in the tail region changes as a function of R and 
Ej in Secs. 5.3 and 5.4 respectively. However, for the case of dijets produced at a center 
of mass energy of 1 TeV, with a jet mass cut of 80 < mj < 100 GeV, as is relevant for 
boosted boson discrimination, and on which we primarily focus throughout this paper, we 
will see that this discrepancy in the tail region is minimal, and we will find good agreement 
between our analytic calculations and the Monte Carlo predictions. It would of course be 
interesting to perform the complete two-loop calculation in the soft haze region of phase 
space; however, we believe that this would have a minor effect for a substantial range of 
parameter space. Nevertheless, the proper inclusion of this region of phase space would 
also be interesting from a resummation perspective, as it would require matching between 
two distinct factorization theorems involving a different number of resolved jets, instead of 
the more familiar case of matching a resummed distribution to a fixed order calculation. 
We leave further investigations of this to future work. 

5 Numerical Results and Comparison with Monte Carlo 

We now present numerical results for signal and background distributions for the D 2 ob¬ 
servable in e + e~ collisions. We give a detailed comparison with Monte Carlo, at parton 
level in Secs. 5.1 through 5.4 and including hadronization in Sec. 5.5. We then study the 
discrimination power of D 2 analytically in Sec. 5.6, and comment on the optimal choice of 
angular exponents. In Sec. 5.7 possible observables which go beyond D 2 , and separately 
resolve the soft subjet, and collinear subjets region of phase space, and how these could be 
used for possible improvements to boosted boson discrimination. 

Throughout this section we use FastJet 3.1.2 [93] and the EnergyCorrelator Fast- 
Jet CONTRIB [93, 94] for jet clustering and analysis. All jets are clustered using the e + e - 
anti -hr metric [93, 99] using the WTA recombination scheme [72, 100], with an energy 
metric. 10 

5.1 Comparison with Parton-Level Monte Carlo 

Previous studies of boosted boson discrimination with ratios of IRC safe jet observables 
have relied entirely on Monte Carlo simulations. While the implementation of both the 
perturbative shower and hadronization are well-tuned to describe simple event-wide ob¬ 
servables, jet substructure observables probe significantly more detailed correlations. For 
the particular case of observables sensitive to two-prong structure, their discrimination 
power is sensitive to the description of massive QCD jets in the phase space region where 

10 We thank Jesse Thaler for use of a preliminary version of his code for WTA in e + e _ collisions. This 
code is now available in the FastJet CONTRIB. 
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the jets are dominated by a resolved splitting. One might naively expect that this region 
of phase space is sensitive to the implementation of the parton shower model, and we will 
see that this is indeed the case. 

While a comparison to recent LHC data on jet substructure observables (for example: 
[11, 12, 14, 25, 26, 31]) is possible, the lack of analytic calculations means that it is difficult 
to disentangle perturbative from non-perturbative effects. In this section we compare the 
results of our analytic calculation for D 2 with a number of Monte Carlo generators at parton 
level, focusing in particular on the small D 2 region. 11 This allows for a detailed probe of the 
simulation of two-prong jets in QCD by the perturbative shower (for a discussion of some 
other variables, see Ref. [121, 122]). A large number of implementations of the perturbative 
shower exist, and are implemented in popular Monte Carlo generators (for reviews, see e.g. 
[123-127]). Some examples include Pythia [97, 98], a p-p-ordered dipole shower; Vincia 
[85-90], Sherpa [128, 129], Ariadne [130], and Dire [131], dipole-antenna showers; and 
Herwig-I —b [132-135], an angular-ordered dipole shower. 12 

As representative of these different Monte Carlo shower implementations, we will use 
the following Monte Carlo generators throughout this section: 

• Pythia 8.205 

• Vincia 1.2.01 with a p^-ordered shower 

• Vincia 1.2.01 with a virtuality-ordered shower 

• HERWIG++ 2.7.1 

All Monte Carlos were showered with default settings except for the caveats listed below 
and requiring two-loop running of a s in the CMW scheme [137, 138] with a s (mz ) = 
0.118. The different shower evolution variables within the Vincia Monte Carlo enables a 
study of their effects. For background distributions, we generate e + e _ —> dijets at 1 TeV 
center of mass energy and study the highest energy R = 1.0 anti -kx jet in the event. For 
signal distributions in Pythia and Vincia, we generate e + e~ —> ZZ events with both Z s 
decaying hadronically. For Herwig, the fixed-order signal distributions are generated in 
MadGraph5 2.1.2 [139] and showered in Herwig. All jets are required to have a mass 
in the window mj 6 [80,100] GeV. In all plots shown in this section, hadronization has 
been turned off in all Monte Carlos. Fixed-order matching was also turned off in Vincia. 

Fig. 14 compares our analytic prediction for the D [> ’ ' spectrum to the parton-level 
Monte Carlo simulations in both background (Fig. 14a) and signal (Fig. 14b) samples. 
The details of the scale variations used to make the uncertainty bands will be explained in 

11 One should always be wary of comparisons of Monte Carlo generators at parton level which employ 
different hadronization models. Our comparisons at parton level presented in this section are to set the 
stage for fully hadronized comparisons in the following section. However, we take the view that a parton 
shower should achieve, to the greatest extent possible, a clean separation between perturbative and non- 
perturbative physics, and therefore should provide an accurate description of observables both at parton 
and hadron level. 

12 Herwig-|— b also has the option for a dipole-antenna shower implementation [136] though we will not 
use it here. 


- 41 - 



0.6 
0.5 
1 0.4 

-B 

^ 0.3 

3 0- 2 
0.1 
0.0 


( 22 ) 

D 2 Parton Level Quark Jet Spectrum 

ITeV, e + e _ -* dijets 
mj e [80. 100] GeV, R=1 

- Analytic 

- Pythia 8 

- Vincia px Ordered 

— Vincia Virtuality Ordered 
- Herwig++ 



D 

(a) 





Parton Level Boosted Z Spectrum 

ITeV, e + e“ -+ ZZ 

m, e [80, 100] GeV, R=1 


Analytic 

Pythia 

Vincia px Ordered 
Vincia Virtuality Ordered 
Herwig++ 


...( 22 ) 

Figure 14: A comparison of our analytic prediction for ’ compared with the parton- 
level predictions of the Pythia, Vincia and Herwig Monte Carlos, a) The D 2 distri¬ 
butions as measured on QCD background jets, b) The D 2 distributions as measured on 
boosted Z boson signal jets. The solid line is the central value of our analytic calculation 
and the shaded bands are representative of perturbative scale variations. The pinch in the 
scale variations is a consequence of unit normalizing the distributions. 


Sec. 5.2, but the pinch in the uncertainties should not be taken as physical. The pinch comes 
from unit normalizing the distributions, and is common in analyses in which scale variations 
are applied to normalized distributions (see, e.g., Refs. [61, 62, 110]). All Monte Carlos have 
similar distributions as measured on signal jets, though Herwig is more peaked at small 
values than the other generators. Our analytic prediction, shown with perturbative scale 
variation, agrees well with the Monte Carlo generators. On background jets, however, the 
distributions are distinct, especially at small values of D 2 . Small D 2 is the region where the 
jet has a two-prong structure, but unlike for signal jets, for background jets that structure 
is not generated by a hard matrix element. In the case of collinear subjets, it is generated 
by a hard splitting function, while for a soft subjet, it is generated by an eikonal emission. 
In the Monte Carlos, small D 2 is the region that is most sensitive to the cutoff effects 
and other infrared choices. As we will show in following sections, by adjusting unphysical 
infrared scales, differences between the Monte Carlos at parton level can be reduced and 
essentially eliminated. 

( 2 ) 

For reference, in App. I we show a collection of e\ ; distributions at both parton and 

( 2 ) 

hadron level for each of the different Monte Carlo generators. Since ■ which is related to 
the jet mass by Eq. (2.8), is set by a single emission, the agreement between the different 
generators, particularly at parton level, is significantly better than for the Do observable. 
This further emphasizes the fact that the D 2 observable offers a more differential probe of 
the perturbative shower, going beyond the one emission observables on which Monte Carlo 
generators have primarily been tuned. 
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In the following sections we will study the partonic Do distributions in more detail. We 
will restrict ourselves to comparing and contrasting p-p-ordered Vinci A and Pythia for a 
few reasons. First, as exhibited in Fig. 14a, these Monte Carlos represent the largest spread 
in their predicted D 2 spectra. Herwig, while it performs very similarly to Vincia, has a 
different hadronization model than Pythia and Vincia. So, directly comparing Pythia 
and Vincia minimizes any implicit hadronization effects when comparing the Monte Carlos 
at parton level. There are still differences due to the cutoff of the perturbative shower, 
which will be discussed in Sec. 5.2. 

5.2 Monte Carlos and Perturbative Scale Variation 

The fact that, in particular, the pr-ordered Vincia distribution for D 2 as measured on 
background agreed with our calculation while the Pythia distribution disagreed in the 
small D 2 region can be understood and quantified further. The bulk of the disagreement 
between our analytic calculation and Pythia, illustrated in Fig. 14a, occurs near the 
peak of the D 2 distribution. It is well-known that for many observables perturbative 
uncertainties tend to be significant in the peak region of the distribution. Therefore, it is 
possible that the difference between the p-p-ordered Vincia and Pythia D 2 distributions 
can fully be explained by large perturbative uncertainties. In this section, we will show 
that by adjusting the cutoff of the parton level shower, the differences between Vincia and 
Pythia can be significantly reduced. 

To estimate perturbative uncertainties in our resummed analytic calculation, the stan¬ 
dard procedure is to vary the scales that appear in the calculation by factors of 2. This is 
at the very least a proxy for the sensitivity of the cross section on these scales. Because 
our factorization theorems contain many functions, as well as merging of distinct factor¬ 
ization theorems, in principle there are numerous scales that could be varied, a complete 
analysis of which is beyond the scope of this paper. A complete list of the variations con¬ 
sidered as well as the resummation procedure can be found in App. G, while here we only 
summarize. In all factorizations theorems, we vary the subjet splitting scales, the in-jet 
soft radiation scales, the out-of-jet soft radiation scales, as well as where the freeze-out 
for the Landau pole occurs in the running of a s . We do not separately vary the scale in 
the soft subjet factorization theorem and the collinear zero bin to ensure that the zero bin 
subtraction is implemented correctly. The scale variation band for the total cross section 
is then taken as the combined band for all possible combinations of these scale variations. 
The soft sub jet cross section displays a particular sensitivity to the out-of-jet scale setting, 
since the running between the boundary soft modes and the out-of-jet modes forces the 
soft subjet energy spectrum to vanish at the jet boundary, 13 though the fixed order cross 
section probes the soft divergence in this region. Thus we also consider several different 
schemes for handling the out-of-jet scale setting. We believe that our scale variation bands 
are representative, and this is supported by the agreement with the Monte Carlo. 

Having understood the perturbative uncertainty bands, we now discuss in more detail 
the discrepancy between the different Monte Carlo generators arising at small values of D 2 , 

13 As explained in Ref. [76], this is connected with the buffer region of Ref. [140]. 
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Figure 15: Analytic prediction for the D 2 distribution for background QCD jets including 
the envelope of the perturbative scale variation, as compared with an analytic calculation 
including just the collinear subjets region of phase space. The effect of the soft subjet 
region of phase space is clearly visible at small values of D 2 . 


as exemplified by the difference between the p^-ordered Vincia and Pythia distributions. 
To understand the origin of this discrepancy, we begin by understanding the effect of 
the soft subjet region of phase space in our analytic calculation. This is possible due 
to our complete separation of the phase space using the energy correlation functions. In 
particular, because we have formulations of distinct factorization theorems in the soft subjet 
and collinear subjets regions of phase space, we can make an analytic prediction for the 
contribution arising just from the collinear subjets region of phase space. In Fig. 15 we 
show a comparison of the D 2 distribution for background QCD jets as computed using our 
complete factorization theorem, incorporating both the soft subjet and collinear subjets 
region of phase space, as compared with the calculation incorporating only the collinear 
subjets region of phase space. Comparing the two curves, we are able to understand the 
effect of the soft subjet region of phase space. In particular, we see that the soft subjet has 
a considerable effect on the distribution at small values of D 2 , giving rise to a more peaked 
distribution, with the peak at smaller values of D 2 , as compared to the result computed 
using only the collinear subjets region of phase space. Although the perturbative error 
bands are large, the systematic effect of the soft subjet region of phase space is clear. 

One further feature of the D 2 distributions, which is made clear by Fig. 15, is that the 
full Z ?2 distribution is not the result of a single Sudakov peak, and therefore our intuition 
about the behavior of different orders in the perturbative expansion, and the behavior of 
scale variations from traditional event shapes fails. In particular, while it is generically the 
case for traditional event shape distributions that lower order resummed results overshoot 
in the peak region, it is not at all clear that this behavior should be true for D 2 , and indeed 
it is not observed. Instead, the contribution from the collinear subjets alone is expected to 
undershoot the peak of the D 2 distribution, since it does not incorporate the soft subjet 


- 44 - 







Figure 16: Comparison of the D -2 distribution for background QCD jets in Pythia and 
p^-ordered Vincia for different jet radii. In each plot, the central value is obtained using 
a shower cutoff of 0.8 GeV, and the uncertainty bands are generated by varying this cutoff 
between 0.4 GeV, and 1.2 GeV. 


region of phase space. The final contribution is then obtained as a superposition of two 
distinct Sudakov peaks, and can therefore behave quite differently from traditional event 
shapes. 

Monte Carlo descriptions of the perturbative shower should provide a similar descrip¬ 
tion of collinear physics, but can differ in their description of soft wide angle radiation. 
Some of these differences were discussed in Sec. 5.1. As discussed earlier, because Vincia 
is a dipole-antenna shower, it should accurately describe both the hard collinear and soft 
wide-angle regions of phase space. Because small values of D 2 are sensitive to both collinear 
and soft physics, the fact that the Pythia distribution at small D 2 is distinct suggests that 
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its description of soft wide-angle physics is the reason. 11 The difference observed in our 
analytic calculation arising from the soft subjet region of phase space is similar to that 
observed between the p^-ordered Vinci A and Pythia Monte Carlo distributions. It is 
therefore interesting to investigate whether the difference in Monte Carlo distributions can 
arise exclusively from different descriptions of wide angle soft radiation. 

We will show that for the Z ?2 observable and jet samples we studied, most of the 
difference can be accounted for by differences in the treatment of unphysical infrared scales 
at parton level in the Monte Carlos. Since we perform this comparison at parton level, 
there is some ambiguity in effects due to the perturbative cutoff of the shower, and those 
arising from different descriptions of wide angle soft radiation. In particular, the Monte 
Carlos will in general have different low-scale pt cutoffs at which the perturbative parton 
shower is terminated. Varying this scale can potentially greatly increase or decrease the 
number of soft emissions because the value of a s in this region is large. In particular, for 
the versions of Pythia and Vincia that were use to generate events in Fig. 14, the cutoff 
in Pythia is 0.4 GeV, while the cutoff in Vincia is 0.8 GeV. Indeed, these are the default 
values for these showers. Therefore, we expect that the Pythia parton shower produces 
more soft emissions than Vincia, which would increase the value of D 2 , and potentially 
also contribute to the observed difference. 

To attempt to disentangle the effects of the shower cutoff from differences in the model¬ 
ing of soft radiation, in Fig. 16, we consider Monte Carlo predictions of the D 2 distribution 
as measured on QCD jets, with different jet radii, namely R = 0.5,0.7,1.0,1.2. By us¬ 
ing different jet radii, we can control the importance of the soft subjet region of phase 
space. With small jet radii, the soft subjet region of phase space does not exist, while it 
becomes increasingly important as the jet radius is increased. Analytic predictions for the 
D 2 distribution for different values of the jet radius, R, will be given in Sec. 5.3. Here we 
compare the pj-ordered Vincia and Pythia Monte Carlo. To generate the central values 
of the curves, we have used a cutoff of 0.8 GeV, and the uncertainty bands are generated by 
varying this cutoff from 0.4 GeV to 1.2 GeV, to understand its effect. From Fig. 16 we see 
that while there is a relatively large uncertainty band from varying the perturbative cutoff 
of the shower, they do overlap for all jet radii studied. This suggests that the dominant 
difference between the D 2 distributions from Vincia and Pythia is due to emissions at a 
scale near the parton shower cutoff. 

This analysis also shows some of the difficulties in disentangling perturbative from 
non-peturbative effects, and the importance of having analytic calculations and precise 
theoretical control of different phase space regions to do so. However, by measuring suffi¬ 
ciently many observables on a jet, we are able to isolate distinct phase space regions and 
study in detail the extent to which Monte Carlo parton showers reproduce the physics in 

14 Part of the reason for why Pythia seems to not correctly describe the soft, wide-angle region of phase 
space may be due to the fact that while it uses kinematics of dipoles in its shower, it still uses the Altarelli- 
Parisi splitting functions as an approximation of the squared matrix element. The dipole and its emission is 
then boosted to the appropriate frame, which may over-populate the soft wide-angle region of phase space 
as compared to the eikonal matrix element. We thank Torbjorn Sjostrand and Peter Skands for detailed 
discussions of this point. 
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the different regions. D 2 , or similar jet substructure observables, could therefore be power¬ 
ful tools for tuning Monte Carlos, both to formally-accurate perturbative calculations, as 
well as data. In the remaining sections of the paper, we will use the default shower cutoffs 
in the Monte Carlo generators, as was done in Fig. 14, and will not show uncertainty bands 
on our Monte Carlo distributions from varying this parameter. 

5.3 Analytic Jet Radius Dependence 

As demonstrated in the previous section, the region of small D 2 is a sensitive probe of the 
dominant soft or collinear structure in the jet. It is therefore interesting to study the jet 
radius dependence of D 2 analytically, because the relative size of soft subjet and collinear 
subjets contributions to D 2 will depend on the jet radius. At large jet radius, as shown 
earlier, the soft subjet region is an important contribution at small D 2 , but as the jet 
radius decreases, the collinear subjets should dominate. In this section, we will study the 
jet radius dependence of D 2 and compare our analytic calculation to Monte Carlo. This 
will also demonstrate that our analytic calculation accurately describes the R dependence 
of the D 2 distribution. As in the previous section, we will restrict this study to p^-ordered 
Vincia and Pythia showers, and will take the jet radius to be R = 0.5, 0.7,1.0,1.2, which 
are representative of a wide range of values of experimental interest. Larger values of 
R can be straightforwardly studied with our approach, but are of less phenomenological 
interest. It is expected that for smaller values of R logarithms of R may become numerically 
important [47, 141, 142], so we do not consider them here. 

Comparisons of parton level Monte Carlo results from both p^-ordered Vincia and 
Pythia to our analytic calculation are shown in Fig. 17. Since we scan over a range of jet 
radii, perturbative uncertainties for each R value are not as extensively explored as earlier 
with R = 1, and are only meant as a rough estimate of the perturbative uncertainty. Our 
focus here is simply to show that the scaling behavior with R between our analytic calcu¬ 
lation and the Monte Carlos agree. There is excellent agreement between the Monte Carlo 
results and our analytic calculations over the entire range of R values. For R> 1, there is 
some disagreement in the position of the peak of the distribution between the generators, 
though, as shown earlier, this can be accounted for by adjusting the shower cutoffs. In 
the peak region, hadronization will play an important role, smearing out differences be¬ 
tween Monte Carlos. The effect of hadronization, and its implementation in our analytic 
calculation, will be discussed in Sec. 5.5. 

For jet radii of R = 0.7,1.0,1.2 our analytic calculation consists of both collinear sub¬ 
jets and soft subjet contributions. For R = 0.5, however, we only include the contribution 
from collinear subjets, which is guided by our matching procedure between the collinear 
subjets and soft subjet factorization theorems, as discussed in Sec. 4.4.1. For a fixed jet 
mass, as the value of R is decreased, the region of validity of the soft subjet factorization 
theorem vanishes rapidly. For jet masses in the range 80 < mj < 100 GeV, and Q = 1 
TeV, we find that between R = 0.7 and R = 0.5 the region of validity of the soft subjet 
rapidly shrinks to zero, and there should not be a transition between the collinear subjets 
factorization theorem and the soft subjet factorization theorem. Because of this, for the 
value of R = 0.7, our perturbative error bands are more extensive, and are taken as the 
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Figure 17: Comparison of QCD background D 2 distributions from p^-ordered VlNClA 
and Pythia to our analytic prediction as a function of the jet radius, R. The values 
R = 0.5,0.7,1.0,1.2 are shown in Figures a)-d), respectively. In the analytic prediction for 
R = 0.5, only the collinear subjets factorization theorem is used, while for all other values of 
the jet radius the analytic calculation includes contributions from both the collinear subjets 
and soft sub jet factorization theorems. The pinch in the scale variations is a consequence 
of unit normalizing the distributions. 
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envelope of curves both that include the matched soft jet, and curves that do not. While 
this is certainly over conservative in the error estimate, we have included this to emphasize 
this point. This feature is also seen explicitly in the plots of Fig. 17, where the region of 
disagreement between the different Monte Carlo generators is squeezed towards zero. A 
similar effect occurs as the energy (or px) of the jet is increased with a fixed jet mass, 
which will be discussed in Sec. 5.4. 

Throughout the remainder of this paper, we will study the case R = 1 exclusively, 
because both collinear subjets and soft subjet regions of phase space must be included and 
that radius is relevant to a large number of jet substructure studies using fat jets. 

5.4 Analytic Jet Energy Dependence 

In addition to studying the dependence on the jet radius as a probe of the importance of 
the soft subjet and of the Monte Carlo description of the shower, it is also interesting to 
study the dependence of the D 2 distribution on the energy of the jet, with a fixed mass cut. 
For highly energetic jets, one expects that the soft subjet will play a negligible role, as the 
region of validity of the soft subjet factorization theorem shrinks as the energy of the jet 
is increased, as long as the mass of the jet is kept fixed. On the other hand, since we keep 
the jet radius used in the clustering fixed, the angular separation of the collinear particles 
decreases with energy, but the phase space for wide angle global soft radiation increases 
considerably. This radiation is present both in the collinear subjets and soft haze factoriza¬ 
tion theorems. It is also of course present in the soft subjet factorization theorem, although 
we have argued that we expect this to give a small contribution. Studying the jet energy 
dependence therefore probes the behavior of the generators in a fashion complementary to 
the R dependence. 

In this section, we study the perturbative D 2 distribution for center of mass energies 

ranging from 500 GeV to 2 TeV, for a fixed jet radius of R = 1, and with a fixed mass cut of 

80 < mj < 100 GeV. This region of energies covers the majority of the phenomenologically 

interesting phase space available at the LHC. We will also perform a more detailed study 

at LEP energies in Sec. 6. For our resummation, we require (amongst other things), that 

e '2 ' 1 <C 1. For the case of a = 2 for which we will be most interested, this corresponds to 

the assumption = w?j/E 2 j <C 1. For a mass cut around the Z pole mass, this expansion 

( 2 ) 

is valid throughout the range of energies we consider. The case when < b but not 
parametrically so, is outside the scope of this paper. 

In Fig. 18 we show distributions for the D 2 observable as obtained from Monte Carlo 
simulation, and compared with our analytic calculation. As in Sec. 5.3, we restrict to 
p^-ordered Vinci A and Pythia at parton level. The perturbative scale variations for each 
energy value are less extensively explored and are only meant to provide a rough estimate 
of the perturbative uncertainty. The evolution of the difference between the Vincia and 
Pythia generators is again quite fascinating, with the discrepancy between the generators 
increasing significantly with energy, to the point that at 2 TeV the qualitative shape of the 
distributions doesn’t agree. In particular, the behavior at small D 2 is completely different 
between the two generators, with Vincia having a large peak, which is not present in 
Pythia. 
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Figure 18: Comparison of QCD background D 2 distributions from p^-ordered Vincia 
and Pythia to our analytic prediction as a function of the jet energy, Ej. The values 
Ej = 500 GeV, and 2 TeV are shown in Figures a) and b), respectively. A jet radius 
of R = 1 is used for all values of the jet energy. The pinch in the scale variations is a 
consequence of unit normalizing the distributions. 


As discussed in Sec. 5.2, this discrepancy between the generators is dominantly due 
to differences in the treatment of the parton shower cutoff. As the energy is increased 
with a fixed jet mass and jet radius, emissions that contribute to D 2 are forced to have 
smaller and smaller energy. As evidence that this is indeed the cause of the discrepancy, 
we have checked that the conclusions of Sec. 5.3 remains true at higher energy, as long 
as the jet radius is taken to scale as R ~ 2 m j/px, so that it constrains the wide angle 
soft radiation. For example, for R = 0.2 at 2 TeV, we find excellent agreement between 
the D 2 distributions as generated by Pythia and Vincia. 15 Because the fact that the 
disagreement is so large between the generators, and is arising from the modeling of soft 
radiation, this may be an excellent observable to study soft radiation and color coherence 
in parton showers. 

( 2 ) 

As a reference, in App. I we show distributions of the ’ observable, measured at 

both 500 GeV, and 2 TeV for both the Vincia and Pythia Monte Carlos, and at both 

parton and hadron level. Unlike for the D 2 observable, since e 2 ' is set by a single emission, 

( 2 ) 

excellent agreement is observed for the e\ observables between Pythia and Vincia both 
at parton level, emphasizing that D 2 offers a more differential probe of the perturbative 
shower than single emission observables. 

Our analytic predictions at 2 TeV, as shown in Fig. 18, are intermediate between the 
Pythia and Vincia results. They exhibit a peaked structure at small values of D 2 , but 
not to the extent seen in the Vincia distribution. We believe that this is largely due to 
the normalization of the distributions, and the fact that we do not match to fixed order 

15 We include this plot in App. I, Fig. 35, for reference. 
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in the tail of the distribution. Since this tail becomes longer at higher energies, a larger 
disagreement in the peak region is also seen. On the other hand, at 500 GeV, our analytic 
prediction has a large peak. This is evidence that because the D 2 spectrum is much more 
sharply peaked at 500 GeV, higher order resummation may be more important in the peak 
region. However, the relatively good agreement between analytics and Monte Carlo shows 
that our factorization theorem is able to accurately capture the energy dependence over a 
large range of energies. 

It is important to note that hadronization will remove some of the discrepancies in the 
D 2 distributions between the Vincia and Pythia generators, especially at high energies, 
where it will smear out the peak at low values of D 2 . While this improves qualitatively 
the behavior of the distributions, discrepancies in the shape still remain. This will be 
discussed in detail in Sec. 5.5, along with its incorporation into our analytic calculation. 
For comparison to precision analytic calculations and interpreting data, it is vital that 
Monte Carlo generators provide accurate descriptions of both the perturbative and non- 
perturbative aspects of QCD jets, and not compensating for perturbative discrepancies by 
the tuning of non-perturbative parameters. This is especially important for disentangling 
non-perturbative effects from perturbative effects, the latter of which should in principle be 
under much better control, and for extracting reliable information about non-perturbative 
QCD from jet physics. 

Throughout the rest of the paper we will focus on jets with radius R = 1 at a center 
of mass energy of 1 TeV. 

5.5 Impact of Hadronization 

Hadronization plays an important role in a complete description of any jet observable, 
and a description of non-perturbative effects, preferably from a field-theoretic approach, 
is required to compare with experimental data. An advantage of the factorization ap¬ 
proach taken in this paper is that it allows for a clean separation of perturbative and non- 
perturbative physics. Non-perturbative effects enter the factorization theorems presented 
in Sec. 3 through the soft function, which describes the dynamics of soft radiation, both 
perturbative and non-perturbative, between the jets. For a large class of additive observ¬ 
ables, the treatment of non-perturbative physics in the soft function is well-understood, 
and can be incorporated using shape functions [83, 84, 143-145]. Shape functions have 
support over a region of size Aqcd> and are convolved with the perturbative soft function. 
In the tail region of the distribution, where the observable is dominated by perturbative 
emissions, they reduce to a shift. For a large class of observables, this shift is determined 
by a universal [146, 147] non-perturbative parameter multiplied by a calculable, observable 
dependent number [147-149]. Similar shape functions have also been used to incorporate 
the effects of pile-up and the underlying event at hadron colliders [150]. 

The effect of non-perturbative physics on multi-differential cross sections has not been 
well-studied. For the double differential cross section of two angularities, Ref. [70] consid¬ 
ered using uncorrelated shape functions for each angularity individually, but it is expected 
that a complete description would require a shape function incorporating non-perturbative 
correlations between observables. For the particular case of the D 2 observable, we will 
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argue that a single parameter shape function can be used to accurately describe the dom¬ 
inant non-perturbative effects, and in particular, that a study of multi-differential shape 
functions with non-perturbative correlations, is not required. Of course, to justify the use 
of a shape function requires the observable in question to be infrared and collinear safe. 
Therefore, we will only consider non-perturbative corrections to Do in the presence of a 
mass cut on the jet. 

In Sec. 4.3 we performed a study of the fixed order singular structure of the D 2 ob¬ 
servable in the presence of a jet mass cut. Importantly, we showed that Do only has a 
singularity at D 2 = 0, with its behavior at all other values regulated by the mass cut. 
Non-perturbative corrections to the D 2 observable will play an important role only when 
the soft scale becomes non-perturbative, which as just argued, for a perturbative mass cut 
of the form studied in this paper, only occurs as D 2 —> 0. Recall that the D 2 observable is 
defined as 


P)( a >0) 

U 2 


(ef ) 3 "/' 3 ’ 


(5.1) 


which is not additive. However, in the two-prong region of phase space, namely D 2 —> 0, the 
value of e^ is set to leading power by the hard splitting, and so D 2 effectively reduces to an 
additive observable. In this region of phase space the description of non-perturbative effects 
in terms of a shape function can therefore be rigorously justified from our factorization 
theorem, and it can be applied directly to the D 2 distribution. For large values of D 2 , it is 
not additive, and the use of a shape function cannot be formally justified. However, in this 
region, a shape function is not required, as any singular behavior is regulated by a mass 
cut. We therefore will use a shape function that falls off exponentially at large values of 
D 2 . We believe that this is a self-consistent approach until non-perturbative corrections to 
multi-differential cross sections are better understood. 

In the two-prong region of phase space, we have shown that two distinct factorization 
theorems, namely the soft subjet and collinear subjets, are required, and in Sec. 4.4.1 we 
showed how these two descriptions can be merged to provide a complete description of the 
two-prong region of phase space. Importantly, the two factorization theorems describing 
the two-prong region of phase space have soft functions with different numbers of Wilson 
lines. The collinear subjets soft function is a two-eikonal line soft function, while the soft 
subjet soft function has three eikonal lines. Since the shape function describes the non- 
perturbative contribution to the soft function, in general we should allow for two distinct 
shape functions, with independent parameterizations. The zero-bin merging procedure in 
Sec. 4.4.1 would then be performed on the non-perturbative cross sections, after convolu¬ 
tion with the appropriate shape function. However, at the level of perturbative accuracy 
which we work, and because we will simply be extracting our shape function parameters 
by comparing to Monte Carlo, the use of distinct parameterizations of different shape func¬ 
tions for both the soft subjet and collinear subjets soft functions would introduce many 
redundant parameters. To simplify the situation in this initial investigation, we will choose 
to use the same parametrization of the shape function, and the same non-perturbative 
parameters for both soft functions. This allows for the non-perturbative corrections to be 
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described by a single parameter, and as we will see provides an excellent description of the 
Monte Carlo data. Because we use the same shape function for both the soft subjet, and 
collinear subjets soft functions, it also implies that the shape function can be applied after 
the zero bin merging procedure, namely, directly at the level of the D 2 distribution. 

As a simple parametrization of a shape function for D 2 , we follow Ref. [150] and 
consider 

m = ;S-e- 2,/!iD , (5.2) 

il D 

where e is the energy and Cljj ~ Aqcd is a non-perturbative scale. Note that while we will 
use the same value of tip for the signal and background distributions, it will have very 
different effects on the two distributions, which will arise naturally from the power counting 
in the different factorization theorems, as will be shown in this section. The function of 
Eq. (5.2) satisfies the required properties that it is normalized to 1, has a finite first moment 
vanishes at e = 0, and falls off exponentially at high energies [144], More general bases 
of shape functions are discussed in Ref. [145], although we find that the single parameter 
shape function of Eq. (5.2) is sufficient to describe the dominant effects of hadronization. 

As discussed above, we will use the shape function of Eq. (5.2) for both the collinear 
subjets and soft subjets factorization theorems, with the same value of 14 d in both cases. 
Because we have enforced this simplification to reduce the number of parameters, it is then 
most interesting to focus on 14/) for the collinear subjets factorization theorem, which has 
two eikonal lines. In this case, we can show that we can relate the 14£> parameter to univer¬ 
sal non-perturbative parameters appearing in e + e _ —> dijet factorization theorems, which 
have been measured in experiment. Therefore, throughout the rest of this section, we will 
focus on deriving scaling relations for assuming we are working in the collinear subjets 
factorization theorem. Again, we wish to emphasize that this is merely a simplification we 
have made to reduce the number of parameters, and a more general treatment could be 
performed, but we will see that with only the single 14 25 , with properties derived assum¬ 
ing the collinear subjets factorization theorem, excellent agreement with Monte Carlo is 
observed. 

The effect of non-perturbative physics as modeled by the shape function is very different 
for background or signal distributions. For background, when D 2 is small, the contribution 
to e 3 °^ from a non-perturbative soft emission is 


where e is the energy of the non-perturbative emission and Ej is the energy of the jet, as 
shown in Eq. (3.5). The non-perturbative contribution to D 2 is therefore 
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In terms of the shape function, the non-perturbative distribution of D 2 for background jets 
can then be written as a convolution: 16 
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(5.5) 


2 u “-^2 

where cr np and <r p denote the non-perturbative and perturbative cross sections, respectively. 

We can estimate the scale at which the global softs of the collinear subjets factoriza¬ 
tion theorem become non-perturbative from the scaling of the modes given in Eq. (3.4). 
Rewriting this scaling in terms of the center of mass energy of the e + e~ collision, Q , and 
D 2 , we find that the global soft scale of the collinear sub jets factorization theorem has 
virtuality 


Ms = 2 D 2 m z \ ~pr~ 


m z V 


Q ) 


(5.6) 


where we have assumed a jet mass, mj = m z , as relevant for boosted Z discrimination. 
Taking Aqcd = 500 MeV, we find that the global soft scale enters the non-perturbative 
regime at D 2 — 1. 

Restricting to /3 = 2, in the collinear subjets region of the background jet phase space, 
the non-perturbative distribution of D 2 “’^ is then 


da 


np 


dD, 


(a-2) 


roc 

/ deF(e) 
Jo 


da n (D 


p 1^2 


O, 2 ) _ pq-2 € E j° 
Ej 


dD 


(«, 2 ) 


where we have used 
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and that, in the collinear subjets region of phase space, 
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( 2 ) 

Because we consider fixed-energy jets with masses in a narrow window, e\ ; is just a number 
and can be removed by appropriate change of variables. Making this change, we then have 
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where the non-perturbative parameter in the shape function is effectively modified to 


= 2 “- 2 3 §- ( 5 . 11 ) 



16 In this initial investigation we do not include a gap in our shape function, which would implement a 
minimum hadronic energy deposit, as expected physically [144]. Such gapped shape functions, and their 
associated renormalon [151] ambiguity [152] have been studied for arbitrary angular exponents [153], and 
could be straightforwardly incorporated in our analysis. However, we observe excellent agreement with our 
single parameter shape function, which we therefore find to be sufficient for our purposes. 
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The non-perturbative parameter Qd still has implicit dependence on the angular ex¬ 
ponent a. Because the global soft modes have the lowest virtuality and can only resolve 
the back-to-back soft Wilson lines in the n and n directions, we can use the results of 
Refs. [148, 149] to extract the a dependence. By the boost invariance of the soft function 1 ' 
along the n — n directions and the form of the observable e 3 as measured on soft particles, 
it follows that Qp takes the form 


D 


-E 

2a - 1 


(5.12) 


where S is a universal non-perturbative matrix element of two soft Wilson lines and all 
dependence on a has been extracted . 18 We have normalized the matrix element such that 
the coefficient is unity for a = 2. We will shortly discuss the extent to which the values 
of Hd we obtain from comparison with the parton shower agree with the known values of 
this universal non-perturbative matrix element. 

For signal jets, the lowest virtuality mode in the jet are the collinear-soft modes. 
Unlike the global soft modes of the collinear subjets factorization theorem, which did not 
resolve the substructure of the jets, allowing us to relate the non-perturbative parameter 
appearing in the shape function to that appearing in dijet event shapes, the collinear soft 
modes in the signal factorization theorem resolve the jet substructure. However, since 
the decaying boson is a color singlet, there are still only two eikonal lines present in the 
factorization theorem. Boost invariance of the soft function will therefore again allow us 
to relate the non-perturbative parameter for the signal distribution to that appearing in 
dijet event shapes. This is similar to the argument used in Ref. [41] to calculate the signal 
distribution for 2 -subjettiness. 

A non-perturbative collinear-soft emission contributes to as 



rsj 

np 



(5.13) 


where now e is the energy of the non-perturbative collinear-soft emission, as shown in 

1 ' This boost invariance holds strictly only for a soft function with no jet algorithm restrictions. However, 
since we are considering fat jets close to hemispherical, we expect corrections to the boost invariance of the 
soft function to be small. 

18 In this section we ignore the effects of hadron masses, and their associated power corrections of 
0(m,H/Q), where mrr is the mass of a stable hadron in the jet. While these power corrections can also be 
incorporated through the shape function, in general, they break the universality of the non-perturbative 
matrix element, E [91, 92]. In particular, Eq. (5.12) is no longer in general true, for a E that is independent 
of the angular exponent a [91, 92]. This depends on the precise definition of the energy correlation func¬ 
tions for massive particles. However, the value of E can still be extracted from dijet event shapes in the 
same universality class as a particular angularity [92]. Furthermore, flu has a scale dependence from renor¬ 
malization group evolution, Q.n = Qd(I-i), although this dependence is logarithmic, and is therefore small 
compared to our uncertainties. We will discuss briefly the impact of hadron masses and the renormalization 
group evolution of fin in Sec. 6, and in App. H. 
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Eq. (3.6). The non-perturbative contribution to Di for signal jets is therefore 
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where in the second line we have used the parametric relationship between e^ and 
in the collinear subjets region. Convolving with the shape function, the non-perturbative 
distribution for signal jets is then 
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(5.15) 


It is important to note how the different scales for the soft radiation in the case of 
the signal and background jets leads to different behavior of the D 2 distributions after 
hadronization. In particular, from Eqs. (5.10) and (5.15) one can determine the shift in 
the first moment of the D 2 distribution caused by hadronization, which we will denote by 
Ad- Restricting to the case a = f3 = 2 for simplicity, we find that for the background 
distribution, 



(5.16) 


whereas for the signal jets, we have 


Ad = 


&D 

~E~j' 


(5.17) 


Since VLd should be of the scale 1 GeV, we see that for signal jets, the shift in the first 
moment due to hadronization is highly suppressed, and behaves differently than a tradi¬ 
tional event shape due to the boost factor, while for background jets, since mj <C Ej , the 
effect of hadronization is significant. We will see that both of these features, which are 
consequences of the power counting of the dominant modes, are well reproduced in the 
Monte Carlo simulations. 

Comparisons between the hadron-level distributions of D 2 ’ from our analytic calcu¬ 
lations and the Monte Carlos are presented in Fig. 19 for background and Fig. 21 for signal 
jets. For background distributions, we compare our perturbative calculation convolved 
with the shape function, as defined in Eq. (5.5). Both Vincia and Pythia use the same 
hadronization model, but Herwig-I — b uses a distinct hadronization model, and therefore 
we allow for a different shape parameter, for the two cases. For the case of Pythia 
and Vincia, we choose to extract the value of Qd by fitting to the hadronized distribution 
for pt ordered Vincia. However, we will shortly discuss the level of ambiguity in VLd 
arising from this extraction. For jets with an energy of 500 GeV and mass of 90 GeV, we 
find that the choice £Id = 0.34 ±0.03 GeV provides the best agreement of our perturbative 
calculation with pr ordered Vincia, while Qd = 0.41 ± 0.03 GeV provides the best agree¬ 
ment with HerwigH — K The errors assigned here come only from the fitting itself, and 
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Figure 19: A comparison of the ’ distributions for background QCD jets from our 
analytic prediction and the various hadron-level Monte Carlos. <r p denotes the parton level 
perturbative prediction for the distribution and a np = a p <S>Fu is the perturbative prediction 
convolved with the non-perturbative shape function. The values of the non-perturbative 
parameter f Ip used are also shown. 
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are due to the statistical uncertainties of the Monte Carlo distributions due to the finite 
width of the histogram bins. These errors do not take into account any other uncertainties; 
for example, whether one should perform the fit to hadron level Vincia or Pythia. This 
level of agreement between the non-perturbative parameters extracted from Pythia and 
HerwigH —b is comparable to more detailed studies, such as Ref. [92], A comparison of 
the distributions of Fig. 19 before and after hadronization shows that hadronization has a 
considerable effect on the background distributions, particularly at small values of D 2 , as 
expected from Eq. (5.16). This effect, which in the Monte Carlos is realized through tuned 
hadronization models, is well described by the single parameter shape function. Impor¬ 
tantly, as discussed above, if different shape parameters were used for the collinear subjets 
and soft sub jets factorization theorems, they would be nearly degenerate in the fit at the 
level of perturbative accuracy that we work, which is why we have made the simplification 
of working with a single non-perturbative shape parameter. 

We have argued that the non-perturbative parameter in the collinear subjets fac¬ 
torization theorem can be related to a universal non-perturbative matrix element of two 
soft Wilson lines. Such non-perturbative matrix elements appear in the factorization theo¬ 
rems of a large class of e + e _ event observables, and has therefore been measured from data 
at LEP. 1:) While the value of tip that we have determined for the two parton showers is by 
no means precise, it is interesting to compare our value with those extracted from precision 
studies of e + e~ collider observables which have been performed in the literature. Using 
the particular case of a = (3 = 2, and converting to our normalization, a recent extrac¬ 
tion of the non-perturbative parameter from an N 3 LL / analysis of the C-parameter event 
shape using LEP data, and including power corrections and hadron mass effects [91, 92], 
gives a value of Qp = 0.28 GeV [158, 159]. This agrees well with our values extracted 
through comparison with Monte Carlo. Going forward, with the goal of increasing both 
the precision and understanding of jet substructure, the ability to relate the dominant 
non-perturbative corrections to the D 2 observable to known non-perturbative parameters 
measured in e + e~ is a valuable feature, and that further study on the non-perturbative 
corrections to multi-differential cross sections is of great importance. 

Many of the features of the background distributions which were present before hadroniza¬ 
tion in Fig. 14a persist after convolution with the shape function. However, they are 
greatly reduced, and they become difficult to disentangle from modifications to the non- 
perturbative shape parameter at the order we work. In particular, from Fig. 19, we see 
that for the choices of Qd that we have used, both Vincia showers agree well with our 
analytic calculation. On the other hand, with the chosen value of ftp, the D 2 distribution 
in Pythia is systematically pushed to higher values as compared with our calculation. 

To try and asses the extent to which this can be accommodated for by adjusting the 
value of Qp, in Fig. 20 we show plots of both Pythia and Vincia with px ordering com¬ 
pared with our analytic results for two different values of the shape parameter. The values 
VLd = 0.34 GeV and f lx> = 0.47 GeV were chosen to give best agreement with the Vin- 

19 An extremely large literature exists on such measurements, and their theoretical interpretation, to 
which we cannot do justice in this brief section. We refer the reader to, for example, Refs. [83, 84, 154-159] 
and references therein. 
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Figure 20: A comparison of the distributions for background QCD jets from our an¬ 

alytic prediction and Pythia and px ordered Vincia Monte Carlos in a) and b). Analytic 
predictions for different values of the non-perturbative shape parameter are shown. 


CIA and Pythia distributions, respectively. This figure makes clear that the disagreement 
between the D 2 distributions as generated by the two Monte Carlo generators can largely 
be remedied by using different values of the non-perturbative parameter. We note also 
that the effect of changing the non-perturbative parameter is of course similar to that of 
changing the perturbative cutoff of the shower, as was discussed in Sec. 5.2, making it 
difficult to disentangle these two effects. 

This plot also gives a feel for the extent to which can be varied before significant 
disagreement is seen between the analytic calculation and a given Monte Carlo distribution. 
Performing the perturbative calculation to higher accuracy would help to resolve some 
of these ambiguities in the value of the shape parameter, by reducing the perturbative 
uncertainty on the shape of the distribution, as well as its normalization. Throughout the 
rest of this paper, when comparing our analytic predictions with Vincia or Pythia, we will 
use the value Vlp = 0.34 GeV as obtained from our fit to hadron level p^-ordered Vincia. 
However, one should keep in mind the level of sensitivity to this parameter. In particular, 
for the application of boosted Z discrimination, we will see that the discrimination power 
of the observable will depend sensitively on the shape of the D 2 distribution below the 
peak, and will therefore exhibit great sensitivity to the value of the shape parameter. 

For the signal distributions, shown in Fig. 21, we use the same choice of non-perturbative 
parameters as for the background distributions. From Eq. (5.17), we have seen that for the 
jets with Ej = 500 GeV, the non-perturbative shift is expected to be of the order 1/500, 
and is therefore completely negligible to the level of accuracy that we work, and the equal¬ 
ity of the non-perturbative parameters between the signal and background distributions is 
not tested. For the signal distributions, we see excellent agreement between the theory pre¬ 
diction and all the Monte Carlo generators. Due to the sharp peak in the distribution, we 
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(a) (b) 



(c) (d) 

Figure 21: A comparison of the D 2 ’ distributions for signal boosted Z jets from our 
analytic prediction and the various hadron-level Monte Carlos. a p denotes the parton 
level perturbative prediction for the distribution and a np = <j p ® Fjj is the perturbative 
prediction convolved with the non-perturbative shape function, although for the signal this 
has a negligible effect. The values of the non-perturbative parameter Qp used are also 
shown. 
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expect higher order resummation is necessary to provide a more accurate description right 
in the peak region, where the perturbative uncertainty in our calculation becomes large. 
Due to the fact that the distributions are normalized, this uncertainty also manifests itself 
in the tail of the distribution. It is known how to calculate the signal distribution to higher 
accuracy [41], and so we do not consider this issue further here. The effect of the shape 
function on our analytic results are consistent with all of the Monte Carlos, whose signal 
D 2 distribution is changed only slightly (i.e., only in the lowest bins) after hadronization. 

We conclude this section by emphasizing how the choice of variable can greatly facilitate 
comparisons with Monte Carlos. An important feature of the D 2 observable is that it 
cleanly separates phase space regions dominated by different physics. In particular, it 
separates the region of phase space where a subjet is formed from that where no subjet is 
formed, as well as separating the regions of phase space where hadronization is important 
from those where it plays a minor role. This enables these effects to be cleanly disentangled, 
and provides a sensitive probe of their modeling. We therefore believe that the observable 
D 2 could play an important role in the tuning of Monte Carlo generators for jet substructure 
studies, and could be used to complement some of the observables proposed and studied in 
Refs. [121, 122]. 20 Furthermore, the observable D 3 [67], which is sensitive to three-prong 
substructure within a jet also provides a clean separation of two- and three-prong regions, 
and could be used to provide an even more detailed understanding of jet substructure and 
the perturbative shower evolution. 

5.6 Analytic Boosted Z Discrimination with D 2 

In this section, we use our analytic calculation, combined with the non-perturbative shape 

(2 2 ) 

functions of Sec. 5.5, to make complete predictions for the discrimination power of D 2 
for hadronically-decaying boosted Z bosons versus QCD quark jets at an e + e _ collider. We 
present comparisons of our calculation to the results of fully hadronized Pythia, Vincia, 
and Herwig Monte Carlos. Here, we also present Monte Carlo results from scanning 
over a range of values for the angular exponent a that is consistent with our factorization 
theorem. Analytic results for boosted boson discrimination were also presented recently in 
Ref. [46] for groomed mass taggers, as well as an analytic study of the optimal parameters. 

In Figs. 22 and 23 we overlay the distributions for D 2 ’ ; as measured on signal and 
background for each Monte Carlo sample, and compare with our analytical calculations 
including the non-perturbative shape function contributions. Fig. 22 shows the complete 
Z ?2 distributions, including the long tail of the background distribution, while Fig. 23 shows 
a zoomed in version, focusing on small values of D 2 , as is most relevant for signal versus 
background discrimination. A representative cut on the D 2 distribution, as could be used 
to select a relatively pure sample of boosted Z bosons, is also indicated. In general, the 

20 Note that Refs. [121, 122] used the observable C 2 , also formed from the energy correlation functions, 
which was proposed in Ref. [65]. Unlike D 2 , C 2 does not cleanly separate the two-prong region of phase 
space from the one-prong region of phase space. A detailed discussion of this point can be found in 
Ref. [66]. The clean separation of the one- and two-prong regions of phase space is the essential feature 
of the D 2 observable, which allows for its precise theoretical calculation and its sensitivity to the shower 
implement at ion. 
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Figure 22: A comparison of signal and background distributions for the four differ¬ 

ent Monte Carlo generators and our analytic calculation, including hadronization. Here we 
show the complete distributions, including the long tail for the background distribution. 
Although we extend the factorization theorem beyond its naive region of applicability into 
the tail, excellent agreement with Monte Carlo is found. 
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Figure 23: A comparison of signal and background ’ ' distributions for the four dif¬ 
ferent Monte Carlo generators and our analytic calculation, including hadronization. Here 
we show a zoomed in view of the distributions at small D 2 , along with a representative cut 
that could be used to select a relatively pure sample of boosted Z bosons. Relevant cuts 
for boosted Z discrimination are to the left of the perturbative peak for the background 
distributions. 
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agreement between the Monte Carlos, for both signal and background distributions, and 
our calculation is impressive. This holds true both for the overall shape of the distributions, 
including the long tail of the background distribution, and for the detailed shape at small 
values of Do- It is also important to note that the perturbative uncertainties remain under 
control, even in the small D 2 region, as seen in Fig. 23. The uncertainty bands do not 
incorporate variations in the non-perturbative parameter Qf). There are however, some 
small deviations between the analytic predictions and the Monte Carlo distributions. The 
background distribution in Pythia is pushed to slightly higher values than our calculation. 
This implies that the signal versus background discrimination power as predicted with 
Pythia will be overestimated. The most conservative prediction for the signal versus 
background discrimination power is from Herwig, whose background distribution is nearly 
identical to our calculation. That Pythia tends to be optimisitic and Herwig tends to 
be pessimistic with respect to discrimination power has been observed in several other jet 
substructure analyses [23, 65-67]. 

An important feature of the D 2 distributions, made clear by Fig. 23, is that in the 
region of interest relevant for boosted Z discrimination, the background distribution is 
deep in the non-perturbative regime. Therefore, although the perturbative uncertainties 
are small, the effect of the shape function, and variations of the non-perturbative parameter 
H/j, is large. Estimates of the uncertainties due to the form of the shape function, or the use 
of more complicated functional forms, along the lines of Ref. [145] are well beyond the scope 
of this paper. An advantage of our factorization approach is that we are able to achieve 
a clean separation of perturbative and non-perturbative effects, and demonstrate relations 
between the non-perturbative matrix elements appearing in our factorization theorems 
and non-perturbative matrix elements which have been measured with other event shapes, 
by using their field theoretic definitions. This separation is essential for understanding 
discrimination performance in the non-perturbative region, which we see is required for 
jet substructure studies related to boosted boson discrimination. Importantly, though, D 2 
seems to take advantage of the different hadronization corrections to signal and background 
jets, and the overlap of the signal and background regions of D 2 decreases significantly in 
going from parton-level to fully hadronized jets. 

In Fig. 24, we have used these raw distributions to produce signal versus background 
efficiency curves (ROC curves) by making a sliding cut in D 2 . The ROC curve from each 
Monte Carlo sample as well as our analytic prediction from our calculated signal and back¬ 
ground distributions are shown in both logarithmic plot and linear plot in Figs. 24a and 
24b, respectively. The band around our analytic prediction should be taken as representa¬ 
tive of the signal versus background efficiency range from varying the perturbative scales. 21 
For the analytic predictions, we use &d = 0.34, as obtained from our fit to the px ordered 
Vincia shower. Consistent with the distributions in Fig. 22, the Monte Carlos are in qual- 

21 Note that ROC curves only make sense for normalized distributions, and therefore the envelopes from 
scale variation cannot be used. Instead, ROC curves are generated from normalized signal and background 
distributions made with a variety of scale choices, with scales varied separately in the signal and background 
distributions. We then take the envelope of these ROC curves to generate the uncertainty bands for the 
ROC curves. 
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(2 2 ) 

Figure 24: Signal vs. background efficiency curves for D % ’ for the Monte Carlo samples 
as compared to our analytic prediction on a a) logarithmic scale plot and b) linear scale plot. 
The band of the analytic prediction is representative of the perturbative scale uncertainty. 


itative agreement with our analytic prediction for the ROC curve. In general, our analytic 
prediction seems to give an optimistic prediction for the discrimination power, however, 
this is driven by the fact that our resummed prediction for the signal distribution is more 
peaked. It would be interesting to perform the NNLL resummation for the signal, which 
should significantly reduce the uncertainty in the signal calculation, particularly in the peak 
region, where the perturbative uncertainties in our present calculation are quite large. Be¬ 
cause of the fact that the distributions are normalized, an improved behavior in the peak 
of the distribution could also improve the agreement in the tail of the signal distribution, 
which is currently systematically low, due to the fact that the peak is systematically high. 
This could enable a conclusive understanding as to the discrepancy between the different 
Monte Carlo generators for both signal and background distributions. In particular, our 
analytic calculations suggest that the Herwig-I — b generator provides pessimistic predic¬ 
tions for the discrimination power of the D 2 observable due to the underestimation of the 
peak height for the signal distribution, and it would be interesting to understand this fur¬ 
ther. Due to the importance of analytically understanding the discrimination power of jet 
substructure observables, such a calculation is well motivated. For the case of a = (3 = 2, 
the required perturbative components could be obtained following relations to e + e _ event 
shapes as were used in Ref. [41]. 

One feature made clear by the linear ROC curve in Fig. 24b is the increase in pertur¬ 
bative uncertainty with increasing Z efficiency. As emphasized earlier, this is due to the 
fact that for the region of interest for Z discrimination, one is probing values of D 2 which 
are below the peak of the background distribution, and therefore in the non-perturbative 
regime. As the Z efficiency is increased, one enters the peak region of the background dis¬ 
tribution, where the perturbative uncertainty is largest, causing a corresponding increase 
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Figure 25: Signal vs. background efficiency curves for D % ’ for the Monte Carlo samples 
as compared to our analytic prediction for two different values of the non-perturbative 
shape parameter, chosen by varying our central value by ±0.15 GeV. Results are shown on 
a logarithmic scale in a) and a linear scale in b). Perturbative scale uncertainties are also 
shown. 


in the uncertainty band for the ROC curve. However, we do not include uncertainties due 
to the non-perturbative parameter VLp or from the shape function, in Fig. 24b, which are 
the dominant sources of uncertainty in this region. 

To demonstrate that is indeed the case, in Fig. 25 we show ROC curves in both linear 
and log scales for two different values of the non-perturbative shape parameter. The values 
of £Ie> where chosen by varying our central value of Qp = 0.34 GeV by ±0.15 GeV (and 
rounding to nice numbers). We have also shown the distributions from the Herwig±± 
and Pythia generators as representative of the ROC curves generated by the Monte Carlo 
generators. This figure makes clear that in the region of efficiencies of interest for boosted 
Z tagging, one is extremely sensitive to the D 2 distribution in the deeply non-perturbative 
region, and this uncertainty swamps the perturbative uncertainty. To be able to improve 
the accuracy in this region will require detailed comparisons with Monte Carlo, data, and 
analytic calculations, to allow for a clean separation of the non-perturbative parameter 
from perturbative modifications to the shape of the distribution. 

To further understand the discrimination power of the D 2 observable, in Figs. 26a 
and 26b we show the background rejection rate at 50% and 75% signal efficiency as a 
function of a, the angular exponent of the 3-point energy correlation function in Below 
about a = 4/3, all rejection rates dramatically decrease as a decreases, while above about 
a = 4/3, the QCD rejection rate in all Monte Carlo samples is impressively flat. This is 
consistent with our power counting analysis of the e£\ phase space plane in Sec. 2.2.1 
and is a powerful verification that the Monte Carlos respect the parametric dynamics of 
QCD. 
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(a) (b) 


Figure 26: Background rejection rate at fixed a) 50% and b) 75% signal efficiency as a 
function of the angular exponent of the 3-point energy correlation function in L> 2 , and a 
comparison to our analytic prediction for a = 2. 


Although our factorization theorem is valid in the region a > 2, for /3 = 2, in Figs. 26a 
and 26b we have only shown the analytic prediction for the value a = 2, where we find that 
it agrees well with the Monte Carlo results, as expected from the agreement of the distri¬ 
butions and ROC curves. For a > 2, while our prediction for the background distribution 
remains accurate (indeed our power counting becomes more valid in this region), the signal 
distribution becomes extremely sharply peaked, which is difficult to describe, and sensitive 
to normalization. Due to the fact that this region is also of less phenomenological interest, 
both because the large angular exponent makes the observable sensitive to pile up contam¬ 
ination, and because both power counting and Monte Carlo analyses indicate that optimal 
performance is achieved for a = 2, we have decided not to focus on this region. It would 
be potentially interesting to see if higher order resummation would be sufficient to describe 
the sharply peaked signal distribution in this region, as well as to test the universality of 
the non-perturbative power corrections. 

One further interesting feature of Figs. 26a and 26b is the correspondence between the 
perturbative scale variations, and the spread in the curves from the different Monte Carlo 
generators, which agree well at both 50% and 75%. For the case of p-p-ordered Vinci a as 
compared with virtuality ordered Vincia, this correspondence is precise, as the difference 
between the Monte Carlos can be viewed as a scale variation, and identical hadronization 
models are used. 

5.7 Discrimination in the Two-Prong Regime 

Throughout this paper, we have emphasized that the discrimination of boosted hadroni- 
cally decaying Z bosons (or W or H bosons) from massive QCD jets is effectively a problem 
of discriminating one- from two-prong jets. We have demonstrated that the observable D 2 
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Figure 27: Distributions for e j, 0 ' 5 ^ (left) and (right) from the signal and background 

Pythia Monte Carlo samples. In addition to the mass cut mj 6 [80,100] GeV, these 

(2 2 ) 

jets are also required to have D 2 ’ < 2.5 to guarantee that these jets are dominated by 

two-prong structure. The parametric boundaries of from Eq. (5.18) are shown with 
the green dashed lines. 


is powerful for this goal. However, in the formulation of our factorization theorem for 
calculating the distribution of D 2 , we needed to perform additional 2-point energy cor¬ 
relation function measurements on the jet to separate contributions from soft subjet and 
collinear subjets contributions to background. While indeed the signal jets are dominantly 
two-pronged, we further know that those prongs are dominantly collinear, and do not have 
parametrically different energies. Therefore, we are able to further discriminate signal 
from background jets in the two-prong region of phase space by exploiting additional mea¬ 
surements that can isolate the soft subjet and collinear subjet configurations. A detailed 
analysis of this is beyond the scope of this paper, but here, we will demonstrate in Monte 
Carlo that such a procedure is viable. 

To investigate this, we measure the observable D 2 ’ ; on jets on which a tight mass 
window cut has been applied. Other angular exponents for D 2 can be used also, but here 
we only measure D 2 to define two-prong jets. We restrict to the two-prong region of phase 
space by requiring that D 2 < 2.5. Then, on the jets that pass these cuts, we measure 
two, 2-point energy correlation functions, and ep\ where (3 < 2. As discussed in 
Sec. 3, the measurement of the two 2-point energy correlation functions provides an IRC 
safe definition of the subjets’ energy fractions and splitting angle. Because we make a tight 
mass cut on the jets, ej, 2 ' 1 is essentially fixed, and only ep is undefined. We will study the 
distribution of eP for both signal and background jets in this region of phase space. 

For a fixed value of and (3 < 2, ep' is parametrically bounded as 

4 2) s4' S) S2' j - 2 (4 2| >' ,/2 - (5.18) 
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In the two-prong region, the lower bound is set by the soft subjet while the upper bound 
is set by collinear subjets. Therefore, for signal jets will peak near 2^~ 2 (e ^)^ 2 , 
while background QCD jets will fill out the full range. We illustrate this in Fig. 27 on 
the hadronized Pythia sample with the appropriate cuts applied. We show plots of the 
distributions of e^ 0 ' 5 ' 1 and > on both signal and background jets and have added dotted 
lines to denote the parametric upper and lower boundaries. As expected, signal peaks near 
the upper boundary and background fills out the entire allowed region and so this additional 
information could be used for discrimination. For the very small values of (3 = 0.5, an 0(1) 
drift is observed with respect to the parametric boundaries, while for (3 = 1, the parametric 
boundaries are extremely well respected. 

This demonstrates a simple example of an observable which goes beyond the simple 
one vs. two prong picture of jet substructure, asking more differential questions about the 
subjets themselves. In particular, it could be used both to further improve the discrimina¬ 
tion power of boosted boson discriminants, and to study in detail the QCD properties of 
subjets. 

6 Looking Back at LEP 

In this section, we consider the D 2 distribution for QCD jets in e + e _ collisions at the Z 
pole at LEP, for which a large amount of data exists. While the use of D 2 for boosted boson 
discrimination is not possible, nor relevant, at LEP, this will emphasize the sensitivity of D 2 
as a probe of two-prong structure in jets. We will suggest the importance of using variables 
sensitive to two emissions off of a primary quark in tuning Monte Carlo generators to LEP 
data. 

Our definition of the energy correlation functions in Eq. (2.1) makes implicit assump¬ 
tions about the treatment of hadron masses, which we have ignored to this point. The 
definition given there is an E-scheme treatment of hadron masses [91, 92], but we could 
equally well define p-scheme energy correlation functions as: 

ef ] = $ I \Pi\ [ 2 ( X “ cos Qij)f /2 > (6- 1 ) 

J i<j&J 

ef ] = J 3 ^2 \Pi\ \Pj\ \Pk\ [2(1 - cos%)2(l - cos 9jk)2(l - cos 9 ik )f /2 , 

1 i<j<k£j 

where p, denotes the three-momenta of particle i. For massless particles, this definition is 
identical to that of Eq. (2.1), and so our perturbative analytics would be unchanged by using 
this definition or the definition of Eq. (2.1). 22 The definitions of Eq. (2.1) and Eq. (6.1) 
differ for massive particles. In particular, the energy correlation functions as defined in 
Eq. (6.1) have the advantage that they vanish for low momentum or collimated particles 
regardless of whether these particles are massless or massive, which is not true of the 
definition in Eq. (2.1). Because of this, we expect that the energy correlation functions as 

22 As will be discussed shortly, the differences in our analytic calculation due to hadron masses will arise 
through non-perturbative effects, namely the shape function. 
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defined in Eq. (6.1) are less sensitive to hadron mass effects and that kinematic restrictions 
on the energy correlation functions remain the same before and after hadronization, so 
that the phase space studied in Sec. 2.2 assuming massless particles is not significantly 
modified. 

At LEP energies, hadronization will also have a larger effect on the D 2 spectrum than 
at 1 TeV. However, a particularly important aspect of our all orders factorization theorem 
is that it isolates perturbative and non-perturbative physics contributions. In this section 
we will again implement non-perturbative effects into our analytic calculation using the 
shape function defined in Eq. (5.2). There are two effects which determine how the shape 
function depends on the jet mass, mj, and the center of mass energy, Q. First, for a fixed 
valued of the shift in the first moment of the D 2 distribution was given in Eq. (5.16), 
which we recall here for convenience, by 



This has dependence on both mj and Q (through Ej ), and for the jets we consider at 
LEP, this is a considerably larger shift than for the 1 TeV jets studied in Sec. 5.5. This 
scaling is a non-trivial prediction of our factorization framework, and we will see that it 
is well respected when we perform a comparison of our analytic results with Monte Carlo. 
Furthermore, the parameter Qu has a logarithmic dependence on a renormalization scale, 
Qu = fl£>(ju), through renormalization group evolution [92], which is briefly reviewed in 
App. H. However, this effect is small compared with the linear change in the first moment 
with Ej for a fixed mj/Ej. A numerical estimate for the effect of the running of is 

given in App. H. At the level of accuracy to which we work in this paper, we cannot probe 
this logarithmic running, although we will see that our results are consistent with it. 

The definition of the energy correlation functions given in Eq. (6.1) also has an effect 
on the universality of the non-perturbative parameter Qp, when hadron mass effects are 
included. Power corrections due to hadron mass effects are of order 0(rnu/Q), where mu 
is a light hadron mass, and are therefore of the same order as the leading 0 (Aqcd/Q) 
power corrections. In the p-scheme definition of the energy correlation functions which 
we have chosen in Eq. (6.1), it is no longer possible to extract the dependence on the 
angular exponent alpha from as was done in Eq. (5.12). However, to the accuracy to 
which we work, we expect this to be a negligible effect, and furthermore, the case a = 2 
is of most phenomenological interest, and is the case we have focused on exclusively in 
this paper. Furthermore, even in the presence of hadron mass effects, it is still possible 
to extract the parameter Qu from dijet event shapes in the same universality class [92]. 
This exhibits the benefits of the factorization approach both for separating perturbative 
and non-perturbative effects, and for relating non-perturbative parameters to maintain 
predictivity. 

One further distinction between the case of boosted Z discrimination and the measure¬ 
ment of QCD jet shapes at the Z pole is that while a tight mass cut is natural for boosted 
Z discrimination, it is not natural in jet shape analyses. However, our shape function 
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(a) (b) 


Figure 28: A comparison of the D 2 spectrum as measured on quark initiated jets at the 
Z pole from the Pythia and p^-ordered Vincia Monte Carlo generators to our analytic 
predictions, a) Comparison of our complete analytic calculation including both the soft 
subjet and collinear subjets region of phase space with the predictions of the Monte Carlo 
generators, b) Comparison of our analytic calculation including only the collinear subjets 
region of phase space compared with the predictions of the Monte Carlo generators. The 
pinch in the scale variations is a consequence of unit normalizing the distributions. 


analysis, as derived in Sec. 5.5, is valid at a fixed jet mass (or correspondingly fixed value 
of e~ 2 ^)- This is clear from both Eq. (5.7) and from the equation for the shift in the first 
moment in Eq. (5.16). However, we emphasize that the non-perturbative parameter Qp is 
unique, and the scaling of the non-perturbative shift with the jet mass is fully determined. 
To achieve an analytic prediction for the non-perturbative D 2 spectrum inclusive over the 
jet mass mass, one must calculate the perturbative D 2 spectra differentially in the jet mass, 
convolve with a shape function for each value of the jet mass, and then integrate over the 
jet mass. While this is in principle straightforward, it is computationally intensive, and is 
beyond the scope of this paper. Instead, we will enforce a jet mass cut of 8 < mj < 16 
GeV. This mass cut was chosen because it is near to the Sudakov peak of the jet mass 
distribution for this jet energy and the mj in this range are set by low scale, but still 
perturbative, emissions. 

Similar to what we did in our numerical analysis at 1 TeV, we begin in Fig. 28 by 
comparing our analytic prediction for the D 2 spectrum with the distributions from parton 
level Monte Carlo. In Fig. 28a, we show a comparison of our complete analytic calculation, 
including perturbative scale variations, along with Monte Carlo predictions from both 
Pythia and py-ordered Vincia, which we take as representative of the different Monte 
Carlo generators. We use a jet radius of R = 1.4 to approximate hemisphere jets. We 
find good agreement with the predictions of the Vincia Monte Carlo. It is important to 
emphasize, however, that at LEP energies, non-perturbative effects are large, and therefore 
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a comparison with parton level Monte Carlo is difficult due to large uncertainties in the 
treatment of the shower cutoff. We also show, in Fig. 28b, a comparison of our analytic 
prediction, including only the collinear subjets region of phase space, with both Monte 
Carlo generators. The difference between the analytic predictions in Fig. 28a and Fig. 28b 
emphasizes the large effect played by the soft subjet at LEP energies. Unfortunately, due 
to large hadronization corrections, the treatment in Monte Carlo of the soft subjet region 
is difficult to disentangle from the treatment of non-perturbative physics. 

In Fig. 29b we show our analytic prediction for the non-perturbative spectra using the 
shape function. An alternate view of the perturbative spectrum is shown in Fig. 29a for 
reference, and to show the overall shape of the perturbative distribution. We have used 
a valued of Up = 0.50 GeV, which was obtained by fitting to the Vincia Monte Carlo. 
There is considerable uncertainty on this value, probably of the order ±0.3 GeV due to the 
wide mass window, which is probably slightly large for the naive application of our shape 
function. Furthermore, as demonstrated in Sec. 5.5, there is some ambiguity in the value 
of depending on whether it is extracted from hadron level Pythia or Vincia, which 
is of this same order. However, this value is consistent with Qq = 0.34 GeV as extracted 
from our analysis at 1 TeV. Although it is expected that the logarithmic running of the VLd 
parameter will decrease its value slightly, this effect is expected to be small. The amount 
by which it is expected to decrease depends on another non-perturbative parameter, but 
is estimated in App. H that Qd should decrease by approximately 0.1 GeV between our 
predictions at 1 TeV and those at LEP energies. This is an important consistency check 
on our results, but due to the large uncertainty, we cannot claim to probe this running 
over the scales that we have considered. The analytic perturbative spectrum is also shown 
for reference. Good overall agreement with both Monte Carlo generators is observed, 
and the discrepancy between the Pythia and Vincia generators which was present at 
parton level is reduced, although still non-negligible. As was discussed in Sec. 5.5, it could 
also be compensated for by a modification of the non-perturbative shape parameter. In 
particular, the effect of hadronization is well captured by non-perturbative shape function. 
Hadronization has a significantly larger effect on the D 2 observable at Z pole energies than 
at 1 TeV. This demonstrates the consistency of our implementation of the non-perturbative 
corrections through the shape function, which predicts the scaling of the shift in the first 
moment through Eq. (6.2). 

Unlike for the Do, distributions at 1 TeV, where the effect of hadronization was well 
described only by a shift in the first moment, at LEP energies the hadronization also has a 
non-trivial effect on the shape of the distribution. This can clearly be seen by comparing 
the dashed perturbative spectrum and the non-perturbative results in Fig. 29b. While 
our factorization of non-perturbative effects in terms of a shape function is completely 
generic, it is only the first moment of the shape function which is universal, with the 
full non-perturbative shape function being in general observable dependent. However, the 
modification in the shape of the D 2 spectrum due to hadronization effects seems to be 
quite well captured by the shape function of Eq. (5.2). In our plots we do not include any 
uncertainties due to the form of the non-perturbative shape function, despite the fact that 
they are the dominant effect throughout most of the hadronized distribution. More general 
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Figure 29: A comparison of the D 2 spectrum as measured on quark initiated jets at the 
Z pole from the Pythia and p^-ordered Vincia Monte Carlo generators to our analytic 
predictions. Results are shown both for parton level Monte Carlo compared with pertur¬ 
bative analytics in a), and for hadron level Monte Carlo compared with non-perturbative 
analytics in b). The pinch in the scale variations is a consequence of unit normalizing the 
distributions. 


shape functions, and a study of their associated uncertainties could be studied along the 
lines of Ref. [145], although this is beyond the scope of this paper, and could only be 
justified if the perturbative components of our calculation were computed to a higher level 
of accuracy. 

Since the D 2 spectrum is sensitive to the emissions from the gluon subjet, it is sensitive 
to the radiation pattern generated by a gluon, and could potentially be used to improve the 
Monte Carlo description of gluons and the modeling of color coherence effects. In contrast 
to most observables which have been used for tuning Monte Carlos to LEP data, such as 
the jet mass which is set by a single emission, D 2 requires two emissions off of the initiating 
quark to be non-zero, and therefore can be used as a more detailed probe of the perturbative 
shower. Although non-perturbative effects play a large role for jets in this energy range, 
we have shown that our factorization theorem allows us to cleanly separate perturbative 
from non-perturbative effects, which could be useful when tuning Monte Carlo generators, 
allowing one to disentangle genuine perturbative effects which should be well described 
by the Monte Carlo shower, from effects which should be captured by the hadronization 
model. We believe that higher order calculations of QCD jet shapes sensitive to three 
particle correlations, such as D 2 , and their use in Monte Carlo tunings is therefore well 
motivated. 

( 2 ) 

For reference, in App. I we show a collection of e\ distributions measured at the Z 
pole, at both parton and hadron level for both the Vincia and Pythia event generators. 
Unlike for the D 2 observable, the Vincia and Pythia generators agree both at parton and 
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hadron level to an excellent degree. This is of course expected due to the fact that these 
Monte Carlos have been tuned to LEP event shapes, but further emphasizes the fact that 
D 2 , and other observables sensitive to additional emissions, provide a more detailed probe 
of the perturbative shower. 

7 Looking Towards the LHC 

Throughout this paper, we have restricted our analysis to e + e _ colliders so that we could 
ignore subtleties with initial state radiation, pile-up and other features important at hadron 
colliders. However, it is precisely for including these effects that a rigorous factorization 
based approach to jet substructure, such as that presented in this paper, will prove most 
essential. In this section, we discuss the extension to the LHC and in particular to what 
extent conclusions for e + e _ colliders holds for the LHC. 

The energy correlation functions have a natural longitudinally-invariant generalization 
relevant for pp colliders, which is given by [65, 66] 

e 2 ] = 4 ~ J2 PTiPTjRij , 

"TJ 1 <i<j<nj 

ef = 4~ J2 PTiPTjPTkRijR-kRjk • (7.1) 

Here ptj is the transverse momentum of the jet with respect to the beam, pxi is the 
transverse momentum of particle i, and nj is the number of particles contained in the jet. 
The boost-invariant angle = (0* — 4>j) 2 + (y t — yj ) 2 is defined as the Euclidean distance 
in the azimuth-rapidity plane. For central rapidity jets, which we will restrict ourselves to 
in this section, the power counting discussion of Sec. 2 is unmodified. Therefore, the same 
conclusions for the form of the optimal observable, D 2 , as well as the range of angular 
exponents, apply. A simplified version of the variable, restricted to have equal 

angular exponents a = f3, was used in Ref. [66], for jet substructure studies at the LHC. 

It is in principle straightforward to extend the factorization theorems for D 2 to hadronic 
colliders, where D 2 is measured on a single jet in an exclusive IV-jet event. Factorization 
theorems for exclusive A r -jet production defined using IV-jettiness [95, 160] or with a pr- 
veto [161, 162] on additional radiation exist and could be combined with the factorization 
theorems of Sec. 3 to describe the jet substructure. We now briefly discuss how each of 
these factorization theorems can be interfaced with the presence of additional eikonal lines, 
representing either additional jets or beam directions in pp collisions. 

Recall from Sec. 3.1.1, that the collinear subjets factorization theorem is formulated as 
a refactorization of the jet function for a particular jet in the n direction, and it is therefore 
insensitive to the global color structure of the event, seeing only the total color. Intuitively, 
the collinear-soft modes are boosted, and therefore all additional Wilson lines in the event 
are grouped in the h direction. Furthermore, the global soft modes, which resolve the 
global color structure of the event do not resolve the jet substructure. This property of the 
collinear subjets factorization theorem has the feature that it can be trivially combined with 
a factorization theorem with an arbitrary number of eikonal lines, without complicating 
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(2 2 ) 

Figure 30: A comparison of the D\ ’ ' distributions for signal and background jets, a) 
Distributions for R = 1 jets at a 1 TeV e + e _ collider, b) Distributions for R = 1 jets at 
the 13 TeV LHC, for jets with transverse momenta in the range px E [450, 550] GeV. 


the color structure. All that is then required, apart from the substructure components, is 
the addition of an additional measurement function in to the global soft function. Indeed, 
this extension has been discussed in detail in Ref. [77]. This same property is of course 
also true for the soft haze factorization theorem, as no additional Wilson lines are required 
to describe the jet substructure in the first place. 

However, for the soft subjet factorization theorem, the presence of additional Wilson 
lines does significantly complicate the factorization from a calculational perspective. In 
particular, since the subjet is soft, arising from a refactorization of the soft function, it 
is emitted coherently from the IV-eikonal line structure as a whole, requiring a proper 
treatment of all color correlations, which becomes complicated with even a few additional 
Wilson lines. A conjectural proposal for the all orders soft subjet factorization theorem 
with IV-eikonal lines was given in Ref. [76], where the soft subjet factorization theorem 
was first proposed and studied in the large N c limit. However, more work is required to 
understand its structure, and an efficient organization of the color correlations at finite 
N c . Furthermore for the soft subjet factorization theorem, the final soft function has an 
additional eikonal line, since the jet substructure is resolved by the long wavelength global 
soft modes, further complicating the calculation (although there has recently been some 
progress in the computation of soft functions [163, 164]). We emphasize however, that 
these are purely technical complications, and believe that the extension to a calculation 
of jet substructure in pp would be well worthwhile for improving our understanding of 
analytic jet substructure. Furthermore, depending on the relevant boosts and jet radii, 
the techniques of this paper could be used to identify whether the soft subjet factorization 
theorem plays an important role, or could be formally neglected, simplifying the calculation 
in more complicated cases. 
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For these reasons, a full calculation in pp is well beyond the current scope of this initial 
investigation. We will instead restrict ourselves to a brief Monte Carlo study comparing the 
properties of D 2 in e + e~ and pp to show that the distributions exhibit similar features. In 
Fig. 30 we compare the Monte Carlo predictions for D 2 ’ ' as measured in e + e~ collisions, 
shown in Fig. 30a, and pp collisions, shown in Fig. 30b. For e + e _ collisions, the event 
selection is identical to earlier. For pp collisions, we generate background events from the 
parton-level process pp —> qq and signal events from pp —> ZZ —> qqqq events, where q 
denotes a massless quark, with Pythia 8.205 at the 13 TeV LHC. 2 " Jets are clustered with 
the anti-Aip algorithm with radius R = 1.0, and using the WTA recombination scheme, 
with a pt metric. We cut on the transverse momentum of the hardest jet, requiring 
Pt E [450, 550] GeV, and on the jet mass requiring m j E [80,100] GeV. These are chosen 
to be similar to the cuts on the jets for the case of e + e _ , although they are of course 
not identical, and strict comparisons should not be made between the two cases. The 
shapes and general features of the D 2 distributions at the two colliders are very similar. 
There is a relative scaling between the D 2 distributions in e + e~ and pp due to the different 
observable definitions. The e + e _ definition uses the 1 — cos (&ij) measure of Eq. (2.1), while 
the pp definition uses the boost invariant definition in terms of Rij, as in Eq. (7.1). Since 
the eg observable correlates particles of separation up to 2 R, where R is the jet radius, 
for a = /3 = 2, this gives an expected factor of 4 difference between the two cases, as is 
approximately observed in Fig. 30. 

The similar behavior of the e + e _ and pp distributions suggests that a complete a cal¬ 
culation using our techniques would provide an excellent description of the D 2 distribution 
at a hadron collider, as we have found for e + e - . Such a calculation would also be inter¬ 
esting to better understand the effects of initial state radiation on the D 2 distribution. A 
simple setting where this calculation would be feasible, for example, would be to consider 
measuring the D 2 distribution on a jet recoiling against a color-singlet such as a IT, Z 
or H boson, as was used in Ref. [62] to perform a NNLL calculation of the jet mass. Al¬ 
though the effects of non-global logarithms would need to be understood, and could play 
an important role, recent progress in this area suggests that this issue could be addressed, 
either by direct resummation of the NGLs [76, 165-168], or through the use of jet grooming 
algorithms which remove NGLs [43, 44, 71]. While it is truly uncorrelated with the jet, the 
effect of radiation from pile-up on D 2 could also be mitigated using similar jet grooming 
algorithms. 

8 Conclusions 

In this paper we have presented a novel approach to the factorization of jet substructure 
observables, and applied it to the identification of two-prong substructure. Instead of 

23 Since we only briefly mention the case of pp colliders, we do not perforin a systematic study of the 
variation of the D 2 distribution in pp with different Monte Carlo generators, as we did for the case of e + e _ . 
However, we believe that this is essential in any jet substructure study at pp, as we expect variations will be 
present, as in the e + e“ case. It would be particularly interesting to compare a py-ordered dipole-antenna 
shower, such as was recently implemented for pp in Dire [131], with the Pythia and Herwig-I —f generators 
which are more commonly used in jet substructure studies at the LHC. 
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starting with a given two-prong discriminant, we used the energy correlation functions as a 
basis of IRC safe observables to isolate the possible subjet configurations. We then studied 
the phase space defined by these IRC safe observables and proved all orders factorization 
theorems in each region of phase space. This procedure naturally identified an observable, 
D 2 , which we argued provided optimal discrimination power, and which preserved the fac¬ 
torization properties of the individual factorization theorems describing different regions of 
the phase space defined by our basis of observables. We showed that a factorized descrip¬ 
tion of this observable could be obtained by merging the different factorization theorems, 
and introduced a novel zero bin procedure in factorization theorem space to implement 
this merging. An important benefit of this approach is that our factorization theorems are 
valid to all orders in a s at leading power and therefore provide a systematically improvable 
description of Re¬ 
using our factorized description of the D 2 observable, we presented a numerical study of 
our results at an e + e _ collider, for both the signal and background distributions, resulting in 
analytic boosted Z boson versus massive QCD jet discrimination predictions. We compared 
with a variety of Monte Carlo generators, and demonstrated that the low D 2 region, where 
a hard two-prong substructure is resolved, is a sensitive probe of the Monte Carlo parton 
shower description. We also studied the effect of non-perturbative corrections, showing that 
they can be well-described using a simple shape function, and related the single parameter 
of this shape function to a universal non-perturbative matrix element measured at LEP. 
This is vital for comparing our calculation with data. 

Because our calculation presents the first factorized description of a two-prong discrim¬ 
inant jet observable in both signal and background regions, there are a large number of 
directions for future study which are of great interest. First, our calculation was presented 
in the context of jets produced in e + e _ collisions. For applications at the LHC, where jet 
substructure plays a vital role, it is important to extend the calculation to jets produced 
at a pp collider. The factorization theorem we presented straightforwardly generalizes to 
pp colliders with only complications due to soft radiation from the beams and the more 
complicated color structure of the hard interaction. The treatment of both these effects 
are well-understood and their inclusion in a jet substructure calculation would allow the 
first precision comparisons of calculations with data. 

An interesting potential application of our factorization theorems, and merging pro¬ 
cedures, which describe in a more differential way the substructure of jets, is to improve 
jet shape based subtraction schemes for QCD calculations at NNLO and beyond. Quite 
recently, subtractions based on the IV-jettiness observable [95] have been used to perform 
NNLO calculations in QCD [169-171]. This allowed, in particular, the calculation of W, 
H +1 jet at NNLO [169, 170] (H +1 jet at NNLO was also calculated using more traditional 
subtraction techniques in [172]). The use of more differential subtractions based on more 
differential factorization theorems would allow for more local, and potentially numerically 
more efficient subtractions. 

It would also be interesting to apply our calculation approach to other observables. For 
example, the IV-subjettiness observables [63, 64] are used extensively in jet substructure 
studies at the LHC, and it would be of significant phenomenological relevance to obtain 
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a factorized description of these observables. The approach presented in this paper could 
also be extended to study more differential observables, such as those used for boosted top 
discrimination, which can resolve three subjets. A generalization of the observable, 

7 ), w hi c i 1 reso lves three prong structure was introduced in Ref. [67] (see also Ref. [173] 
where it was used for boosted top discrimination at a 100 TeV collider). The £)(“A7) 
observable should exhibit similar factorization properties to that of , and hence should 

be calculable with similar techniques. A rigorous factorization will also prove essential in 
this case, allowing for the separation of perturbative and non-perturbative physics, as well 
as effects associated with the finite top width [111, 174]. More generally, we anticipate that 
the approach to the factorization of jet substructure observables presented in this paper 
will allow for the construction of more powerful jet substructure discriminants and will 
enable a more detailed analytic understanding of the substructure of high energy QCD 
jets. 
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A Conventions and SCET Notation 

In the body of the text we have presented the required factorization theorems for studying 
the two-prong substructure of jets using the D 2 observable. Although all the factorization 
theorems were presented, only heuristic descriptions of the functions appearing in the 
factorization theorems were presented in an attempt to appeal to a broader audience, and 
so as to not distract the reader with technical complications. In these appendices, we give 
the operator definitions of the functions appearing in the factorization theorems of Sec. 3, 
and calculate the functions to one-loop accuracy. 

In this appendix we begin by summarizing some notation and conventions. The factor¬ 
ization theorems presented in this paper are formulated in the language of SCET [79-82]. 
We assume that the reader has some familiarity with the subject, and will only define our 
particular notation, and review the definition for common SCET objects. We refer readers 
unfamiliar with SCET to the reviews [175, 176]. 
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SCET is formulated as a multipole expansion in the momentum components along the 
jet directions. Since we take the jet directions to be lightlike, it is convenient to work in 
terms of light-cone coordinates. We define two light-cone vectors 


= (1, n ), = (1, — ft ), 


(A.l) 


with n a unit three-vector, which satisfy the relations n 2 = n 2 = 0 and n ■ n = 2. We can 
then write any four-momentum p as 

P 11 = n-p— + n-p — + p% ± . (A.2) 

A particle in the n-collinear sector has momentum p close to the n direction, so that its 
momentum scales like (n-p,n-p,p n ±) ~ n-p (A 2 , 1 , A), with A « 1 a small parameter. 
The parameter A is a generic substitute for the power counting parameters in the different 
factorization theorems presented in Sec. 3, and since our factorization theorems involve 
multiple scales, there are generically multiple distinct As. 

In the effective field theory, the momentum of the particles in the re-collinear sector 
are multipole expanded, and written as 

71 M 

p^ = p^ + = n-p — + p^ ± + , (A.3) 

where n ■ p and p n ± are large momentum components, which label fields, while A: is a small 
residual momentum, suppressed by powers of A. This gives rise to an effective theory 
expansion in powers of A. 

SCET fields for quarks and gluons in the n-collinear sector, £ n ,p(x) and A nt p(x), are 
labeled by the lightlike vector of their collinear sector, n and their large momentum p. We 
will write the fields in a mixed position space/momentum space notation, using position 
space for the residual momentum and momentum space for the large momentum compo¬ 
nents. The residual momentum dependence can be extracted using the derivative operator 

\d^ ~ k, while the large label momentum is obtained from the momentum label operator 

■pi 1 
/ n • 

Operators and matrix elements in SCET are constructed from collinearly gauge-invariant 
quark and gluon fields, defined as [79, 80] 


Xn,U x ) 

B n, W ±(*) 


5(U) -V n )Wl(x)£, n {x) , 

- \5{u + V n )Wl{x)iD^ ± W n (x) 


(A.4) 

(A.5) 


The _L derivative in the definition of the SCET fields is defined using the label momenta 
operator as 


= K± + sK± 


(A.6) 


and 


W n (x) 


ex P \-^-n-A n (x)\ , 

.perms \ n / 


(A.7) 
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is a Wilson line of n-collinear gluons. We use the common convention that the label 
operators in the definition of the SCET fields only act inside the square brackets. Although 
the Wilson line W n (x) is a non-local operator, it is localized with respect to the residual 
position x, and we can therefore treat Xn,u(x) and Bn^ u (x) as local quark and gluon fields 
when constructing operators. The operator definitions for jet functions in these appendices 
are given in terms of these collinear gauge invariant quark and gluon SCET fields. 

Our operator definitions will also involve matrix elements of eikonal Wilson lines, 
which arise from the soft-collinear factorization through the BPS field redefinition at the 
Lagrangian level [81]. The Wilson lines extend from the origin to infinity along the direction 
of a lightlike vector, q, specifying their directions. Explicitly 



(A.8) 


Here P denotes path ordering, and A is the appropriate gauge field for any sector which 
couples eikonally to a collinear sector with label q (for example collinear-soft, soft, boundary 
soft), and the color representation has been suppressed. All Wilson lines are taken to be 
outgoing, since we consider the case of jet production from e + e _ collisions. 

Throughout this paper we have considered the production of two jets, one of which 
has a possible two-prong substructure, in an e + e _ collider. This implies the presence of 
at most three Wilson lines in the soft or collinear soft function. With only three Wilson 
lines, all possible color structures can be written as a sum of color-singlet traces. In the 
more general case, with more than three Wilson lines, the soft function is a color matrix 
which must be traced against the hard functions, which are also matrices in color space, 
appearing in the factorization theorem for the cross section (see e.g. Refs. [110, 177] for 
more details). 

In App. B through App. E we will give operator definitions for the functions appearing 
in the factorization theorems in terms of matrix elements of the SCET operators, 
and Bn,u{x), as well as products of soft Wilson lines. These matrix elements can be 
calculated using the leading power SCET Lagrangian, which can be found in Refs. [79— 
82], or by using eikonal Feynman rules in the soft functions, and known results for the 
splitting functions to calculate the jet functions [178]. We will use the latter approach, as 
it considerably simplifies the calculations at one-loop. 

B One Loop Calculations of Collinear Subjets Functions 

In this appendix we collect the calculations relevant to the calculation in the collinear 
subjets region of phase space, and explicitly show the cancellation of anomalous dimensions. 
The calculation follows closely that of Ref. [77], with the exception of the form of the 
measurement function. Nevertheless, the calculation is presented in detail, as the SCET+ 
effective theory has not been widely used. 
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Kinematics and Notation 


For our general kinematic setup, we will denote by Q the center of mass energy of the e + e - 
collisions, so that Qj2 is the energy deposited in a hemisphere, i.e. the four-momenta of 
the two hemispheres are 

Phemisphere 2 = — Pl\ (B.l) 

SO 

s = Q 2 . (B.2) 


^hemisphere! — 


Q 


1P1 


We will also denote the energy in a jet at intermediate stages of the calculation by Ej , but 
we will write our final results in terms of Q. 

We work in the region where one hemispherical jet splits into two hard subjets, assume 
the power counting z ~ with z being the energy fraction of one of the jets. We further 
assume the power counting relations between the energy correlation functions valid in 
the collinear subjets region, as discussed in Sec. 2.2. We adopt the following notation to 
describe the kinematics of the sub jets 


Subjet a,b momenta: 

Pai Pb 

(B.3) 

Subjet a,b spatial directions: 


(B.4) 

Thrust axis: 

fla + fl b 

n = —^--—j- 

\n a + n b \ 

(B.5) 

Light-cone vectors: 

n = (1,n), n = (1,-n), 

^a,b (lj > ?l'a,b)') ^a,b (1? ^ a,b ) • 

(B.6) 


In the collinear soft region of phase space, we have n a • <C 1. When performing expan¬ 
sions, we can work to leading order in n a ■ n b , and must use a consistent power counting. 
It is therefore useful to collect some kinematic relations between vectors which are valid at 
leading power. These will be useful for later evaluations of the measurement function and 
integrand at leading power. These kinematics satisfy the following useful relations 


n ■ n a = n ■ n b 


n a • n b 
4 


(B.7) 


n ■ n a = n ■ n b = 2 , 


(B.8) 


T^cl ’ W'b 

n-_L a,b ‘ nj _ a ,b ^_La,b ' ^_La,6 ' hj_ ab — ■ (B.9) 

For a particle with the power counting of collinear sector a or b, we have the following 
simplified relations 


Pa ~ ^ ( n ' Pa)n a , 

Pa ~ ^(n-Pa), 


Pb ~ 2 ( n ’ Pb) n b 5 
Pb ~ \{n-Pb)i 


(B.10) 

(B.ll) 
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which are true to leading order in the power counting. Finally, we label the energy fractions 
carried in each sub jet by 


z a ,b = 


2p\ 


a,b _ W> ' Pa,b 


Q 


Q 


where the second relation is true to leading power. 

The value of is given to leading power by the sub jet splitting 


„(“) _ i 


— r=^E a E b 
E J 


2pa ■ Pb \ 0/2 


E a E b 

= 2 a/2 z a z b (n a ■ n b ) a/2 


(B.12) 


(B.13) 

(B.14) 


In the collinear soft region of phase space, the 3-point energy correlation function 
is dominated by the correlation between two particles in different subjets, with a third 
collinear, soft, or collinear-soft particle. Depending on the identity of the third particle, 
the power counting of the observable is different. We begin by collecting expressions for the 
4 a) observable for a single soft, collinear-soft, or collinear emission, which will be required 
for the one-loop calculations. 

For three emissions, with momenta k\. k 2 , k 3 , the general expression for the three point 
energy correlation function is 



— k°k°k° 


2k\ ■ k<2 
k°k° 


a/2 


2k\ ■ k 3 

uOuO 

"'l' v 3 


a/2 


2 fc 2 • k 3 

7,0 7,0 


a/2 


(B.15) 


For an emission collinear with one of the subjets, where we have the splitting p a b —> k\ +/c 2 , 
we can write e ^ entirely in terms of • fc 2 , n a ■ n b , and n a ■ Aq, 2 , because there is a hierarchy 
between the opening angle of the dipole, and the opening angle of the splitting. At leading 
power it is given by 


4 Q) =kuk 2 \\n a 2 5a/2 z b {n a ■ n b ) a (^Q^j 
4 Q> =fci,fc 2 ||n 6 2 5a/2 z a {n a ■ n b ) a (^Q^j 


fria-k 1 \ 1 2 f n a ■ k 2 \ 1 2 

\ Q ) V Q ) 

( fib ■ ki \ 1_ 2 / n b ■ /c 2 \ 1_2 

V Q ) V Q ) 


(B.i6) 

(B.17) 


For a soft emission off of the dipole, with momentum k, which cannot resolve the 
opening angle of the dipole, we have 


n a ■ k —>• n ■ k , n b ■ k —> n ■ k , (B.18) 

at leading power. We then find 

4"> = 2 s “ /2+ U a2s (n„ ■ n b )^ ( n ' k £ } ’'' k ) I_ ” (^)° . (B. 19) 

where we have used the full expression for the energy of the soft particle, as it is not 
boosted. 
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For a third collinear-soft emission k off of the p a & partons, for which there is no 
hierarchy between the opening angle of the dipole and the opening angle of the emission 
(i.e. a collinear soft emission), eg is given by 


e («) = 2 3«/2+l M ( „ a . nk) „/2 


- 7 \ 1—a / 7 \ a/2 

n • k \ ( n a • k' ' 


2 Q 


Q 


rib ■ k 

~Q~ 


a/2 


(B.20) 


For the SCET operators involved in the matching calculation, we follow the notation 
of Ref. [77] , defining 


02 = XnY/TYr 


nXn : 


(B.21) 


which is the usual SCET operator for e + e 

03 = Xn a Bt ni 


dijets, and 


XlX nb T A X nb Vn .. Y/Y n 

J ij L J jk 


r[x- 


n\k > 


(B.22) 


which is the SCET + operator describing the production of the collinear subjets. Through¬ 
out this section, we will not be careful with the Dirac structure of the operators, as it is 
largely irrelevant to our discussion. With this in mind, we have not made the Lorentz in¬ 
dices explicit on the operators. Here we have chosen to write the Wilson line corresponding 
to the gluon in the fundamental representation. Note that the two stage matching onto 
SCET + makes it clear that the partonic configuration in which the two collinear subjets 
are both quarks is power suppressed. In the operators 02 , 03 , we have used Y to denote 
soft Wilson lines, and X, V to denote collinear-soft Wilson lines. In the definitions of the 
factorized functions below, we will refer to all Wilson lines as S, as after factorization, no 
confusion can arise. 


Definitions of Factorized Functions 

The functions appearing in the collinear subjets factorization theorem of Eq. (3.8) have 
the following SCET operator definitions: 

• Hard Matching Coefficient for Dijet Production 

H(Q\p) = \C(Q\p)\\ (B.23) 

where C (Q 2 , p) is the Wilson coefficient obtained from matching the full theory QCD 
current onto the SCET dijet operator Xn^Xn 

(qq\ W|0) = 0 (Q 2 , y) (gg|0 2 |O). (B.24) 


When accounting for the Lorentz structure, there is a contraction with the leptonic 
tensor, which we have dropped for simplicity. See Ref. [110] for a detailed discussion. 


• Hard Splitting Function: 






(B.25) 
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where C 2 z a , is the Wilson coefficient in the matching from O 2 to O 3 , namely 

the relation between the following matrix elements 

(<7<7S'|C , 2|0) = C 2 n) (qqg\O ?J \0) . (B.26) 

• Jet Function: 

■fn^(4 a) ) = (B.27) 

^tr(0|^Xn o , 6 (0)«5(Q - n a , b ■ V)S^ (V ± )s{e^ - E 3 ^)xn a , 6 (0)|0) 

For simplicity, we have given the definition of the quark jet function. The gluon 
jet function is defined identically but with the SCET collinear invariant gluon field, 
&n a b ,±, instead of the collinear invariant quark field. 

• Soft Function: 

Snn(4 a) ;^) = ^tr(0|T{S n ^}<l(e!j a) - @ R E 3 ^f{S n Sn}\0) (B.28) 

• Collinear-Soft Function: 

5n„n 6 n(e5 Q) ) = tT(0\T{S na S nb Sn}5 (e< a) - E 3 ^)f{S na S nb Sn}\0) (B.29) 

In each of these definitions, we have defined an operator, E 3 ^“\ which measures the contri¬ 
bution to eg 0 '* from final states, and must be appropriately expanded following the power 
counting of the sector on which it acts, as was shown explicitly in Eq. (B.16), Eq. (B.19), 
and Eq. (B.20). These operators can be written in terms of the energy-momentum tensor 
of the full or effective theory [179-182], but we can simply view them as returning the value 
of e 3 °) as measured on a particular perturbative state. The soft function is also sensitive 
to the jet function definition, which is included through the operator 0 R . To simplify 
the notation, we have strictly speaking only defined in the in-jet contribution to the soft 
function. Additionally, we assume that some IRC safe observable is also measured in the 
out-of-jet region, although this will play little role in our discussion, so we have not made 
it explicit. 


Hard Matching Coefficient for Dijet Production 

The hard matching coefficient for dijet production, H(Q 2 ,fi), appears in the factorization 
theorems in each region of phase space. H(Q 2 , g) is the well known hard function for the 
production of a qq pair in e+e - annihilation. It is defined by 

tf(Q 2 , / u) = |C(Q 2 , / u)| 2 , (B.30) 


where C (Q 2 ,/i) is the Wilson coefficient obtained from matching the full theory QCD 
current onto the SCET dijet operator XnXxn- This Wilson coefficient is well known 
(see, e.g., Refs. [77, 110, 183, 184] ), and is given at one-loop by 


C{Q\g) = 1 + 


OLs{jj) Cf 
47T 


-log 2 


-Q 2 

M 2 


+ 31og 


-Q 

V 2 


21 


-R- 

- 8 + T 


(B.31) 
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The branch cut in the logarithms must be taken as — Q 2 —> —Q 2 — it. The hard function 
satisfies a multiplicative RGE, given by 


(Q 2 ,m) = 2Re [tc (Q 2 ,m)] , 


(B.32) 


where 7 c(Q 2 ^) is the anomalous dimension for the Wilson coefficient, which is given to 
one-loop by 


7 c{Q 2 ,h) 




(B.33) 


Hard Splitting Function 

The hard splitting function can be calculated using known results for the one-loop splitting 
functions [185] or from the result for e + e _ —> 3 jets [186]. However, since at leading power 
the measurement of the 2-point energy correlation functions define the energy fractions 
and splitting angle, it is simplest to change variables in the results of Ref. [77], where the 
hard splitting function matching was performed for jet mass. Using the notation t = s qg , 
and x = Sgq/Q 2 , Ref. [77] gave the matching coefficient to one-loop as 


HV q9 (t, = Q 2 {1 + 

27t t 1 — x l 


aJjJ 0 
2 n 


Ca r 
-T~ Cf 


21og—log x + log 2 x + 2Li 2 (l - x) 




C A 


t 


7n 


t 


( log -3 -g- + 21og—log(l - x) + log (1 - x) + 2Li 2 (x) ) + ( C A ~ C F ) 


1 — x 

1 + X 2 


(B.34) 


We can now perform a change of variables to rewrite this in terms of eg , and the subjet 
energy fractions, using the leading power relation of Eq. (B.13), and the kinematic relations 
valid in the collinear subjets region of phase space. We find 


t = 


q 2 (. z a z b y 2 /“ 7 


X = Zq, 


(B.35) 


and 


Hr q9 (ei a \z q ,^ = 


(B.36) 


x < 1 + 


aJjj) 

2 n 


C a 


- Uf 21 og 


Q2 ( Z a Z b y 2 1 01 


2/a' 


log z q + log 2 z q + 2Li 2 (l - Zq) 


q 2 (ZaZb) 1 2 / a (e 2 a) ) 


+2 log 



77T 2 

IT 


log(l - Zq) + log 2 (l - Zq) + 2Li 2 {z q ) + {CA - C F ) 


1 - Zq 

1 + Z 2 


Note that the hard splitting function depends on the partons involved in the split, which 
in our case we have taken to be q —> qg, and therefore singled out z q , which is the energy 
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fraction of the quark jet (defined identically to z a , Zb). Throughout the rest of this appendix, 
we will, whenever possible, write results in terms of z a , and Zb for generic partons, using 
general Casimirs. Since we consider the case q —> qg , we will calculate the jet functions for 
both quark and gluon jets, and therefore the results in this appendix are sufficient to treat 
general two-prong substructure, where the prongs are associated with generic partons by 
using the hard splitting function for other partonic splittings. 

For completeness, we also present the one-loop results for g —>• gg and g —>• qq splittings. 
While one-loop, and even two-loop, splitting helicity amplitudes exist in the literature 
[185, 187, 188], to our knowledge, the one-loop unpolarized splitting functions have not not 
been explicitly written down before. Using the results from Refs. [185, 188], the one-loop 
function for the g —>• gg splitting is 


H^"(Sgg,Z,ll) = 


oc s (h)Ca 1 ( z 


2 ir 


°99 

Z 


-h°g 2f ? - hog 2 ,— 

2 fi z 2 1 — z 


1 — z . 

i-U — + 2(1 - 2) 

+ 5T + U-U1 


2 vr 


log—log(z(l - z)) 


D 99 


12 


z(l-z) 


3 N J l + z 4 + {l- z) 4 


(B.37) 


Here, we have expressed the result in terms of the numbers of colors, N, of the gauge theory 
and number of active quarks, n F . Note that Ca = N. The virtuality of the splitting is 


Sgg — 2z a ZbEj(n a • Tib) i 


(B.38) 


where a and b denote the final-state gluons in the splitting. Its anomalous dimension to 
one-loop is 


^ 9^99 ~ 


a s {g) 


7r 


IVlog^ + Nlogz(l - z) - ^ 

fj,- 2 


(B.39) 


For the one-loop result of the g —>• qq splitting, we have 


H^(S q g,Z^) = 

+ 


a s (g)riF 1 ( 2 


27T Sqq 


(z* + (l-zf) 1 + 


« s (At) 

2vr 


IVlog—log(z(l — z)) 

s qq 


3 1, u? 2 rip, U 2 

-loff- 1 - 

2 N 


, LL“ '2riF , 13 U 1 , 9 11 

log-— log-1-IVlog-1-— log 2 


3 <7<? 




?qq 


2N 


Sqq 


1 77T 2 


„ 7T 


N, 


- N -log z - 

IV 12 6 2 B 1 -z 


40 A7 10 

H- N - riF 

9 9 


(B.40) 


Note that, in terms of the number of colors, 


C F = 


N 2 - 1 
2 N 


Its anomalous dimension is 


^ 9^-99 ~ 


7r 


" 77 log+ N log(z(l - z)) + ^ - 3 C F 
N Sqq 2 


(B.41) 
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Global Soft Function 

In this section we calculate the global soft function. The global soft modes can resolve 
the boundaries of the jet, so the jet algorithm constraint cannot be expanded. However, 
the soft modes do not resolve the dipole of the collinear splitting. The global soft function 
therefore has two Wilson lines in the n and n directions. A general one-loop soft function 
can be written as 

4 ’ (4“’) = \ E T < ' T > 4% (4”’) ■ (B.42) 

* 74 ? 

where T, is the color generator of leg i in the notation of Refs. [189, 190], and the sum 
runs over all pairs of legs. Here we have only the contribution from i , j = n, n, but we still 
perform this extraction of the color structure to keep the results generic. 

The one-loop integrand for the soft function is given by 



with d = 4 — 2e, and where here we have extracted the normalization factor 


N S = 2 3a ' 2+1 z a z b ( n a ■ n b ) a ' 2 , (B.44) 

following the expression for the three point energy correlation function in the soft power 
counting, given in Eq. (B.19). The first ©-function in Eq. (B.43) implements the jet 
algorithm constraint, which is simple for a single emission. To simplify notation, we also 
use the following shorthand for the measure for a positive energy, on-shell, collinear particle 

[d d k}+ = r©(" ' k)5(k 2 ). (B.45) 

To perform this integral, it is convenient to make the change of variables 

n-k = v, n-k = vu , (B.46) 


which factorizes the jet algorithm constraint and the measurement function. The integrals 
can then be evaluated using standard techniques. Performing all the integrals but the u 
integral, and transforming to Laplace space, e^ —> e ^, gives 


d(l) 

°G,nn 



g 2 e-^T(-2e) ( e^tzei a) N s \ 2€ du 2e(1 _ a) 

(2vr) 2 r(l-e) ^ 2i-«Q J J 0 ’ 

(B.47) 


This can be integrated exactly in terms of hypergeometric functions, 


ftan # 


du n , x 2 e(l-a) = r(-e(l -2a)) 
ni+di- 2 ^ t} T(1 — e(l — 2a)) 

—2(1—a)e 


(B.48) 


, 2 R 

x tan — 


tan 2 

1 + tan 2 f 


2^1 


1, —2(1 — a)e; 1 — (1 - 2a)e; 


t&n 2 § 

1 + tan 2 f 
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where we have used both a Pfaff and an Euler transformation to extract the singular 
behavior from the hypergeometric function. We therefore have 


( 1 ) („) a s e- £ ^r(-2e) ( e™^N s ^ R\~ r(-e(l - 2a)) 


2e 


0 G,nriA e 3 1 — 


7r T(1 — e) 


, ^ tan — 

2 1 -“Q 2 


T(1 - e(l - 2a)) 


(B.49) 


tan 2 ^ \- 2(1 " a) ‘ 


1 + tan 2 f 


!*1 


1, —2(1 — a)e; 1 — (1 — 2a)e; 


tan 2 R 
1 +tan 2 f 


Expanding in e (throughout these appendices we use the HypExp package [191, 192] for 
expansions of hypergeometric functions) and separating in divergent and finite pieces, we 
find 


c(l)div/~(a)\ _ a s 
S G,nn l e 3 ) ~ o_ 


S(l)fin,-(ah _ a s 
a G,nn\ e Z ) ~ n 
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(B.50) 


2 1 ~ a Q 


2 a - 1 


+ log 


e lE [ie\ Ns 
2 1 ~ a Q 


log 
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7r 


2a — 1 2 

H-;—log 2 


2 R 
tm 2 


+ (a - l)Li 2 


2 

tan - 


2 

— tan — 


(B.51) 


8(2a - 1) ' 4 

where Li 2 is the dilogarithm function. 

Jet Function 

To calculate the jet function, we use the approach of Ref. [178] and integrate the appropriate 
splitting functions against our measurement function. In the power counting of the jet 
function, we can expand the jet algorithm constraint 


_ , o R n ■ k 

01 ‘“Vim 


1 


The one-loop jet function in the n a direction is then given by 

4i(Qj>4 a) )= f d$ c 2 *Zdfe ( 3 a) -Nj 


(!) m „ J a )l _ / SrT,c n r I (a) nT t n a ■ k\\ ^ (n a ■ k 2 \ ^ (k\ ■ k 


Q J v Q 

The two particle collinear phase space is given by [193] 


Q- 


(B.52) 


a/2 ' 


(B.53) 


d$ c 2 = 2(2ir) 6 ~ 2e Qj 


d d k i 


J+ L 


d d k 2 


5(Qj ~ n a ■ h - n a ■ k 2 )5 d 2 (ki± + k 2 ±), (B.54) 


and 


C I M 2 e 7 E Y2g 2 

1,2 = I ~ir -v Pi{z) 


(B.55) 
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where 


and 


P, = CA 


P q {z) = C F 


1 - z 


1 + z 2 
1 - z 


-e(l-z) 


i- + - +z(l-z) 

1 — z z 


+ !V 
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2z(l - z) 
1 - e 


(B.56) 


(B.57) 


which includes both the g —>■ gg and g —>• qq contributions. Explicitly, the integrand is 
then given by 




(«)\ _ ( ^ elE 

4n 


2(27r) 3 2 e Qj 2 g 2 I [d d k 1 } + I [d d k 2 ] + - 
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rig-ki 
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(B.58) 


x S(Qj -n a -ki-n a - k 2 )5 d 2 (k 1± + k 2 ±) 


xS\4 a) -Nj 


— 7 \ 1 — a/2 / — 7 \ 1 — a/2 / 7 7 \ a/2 

n a -k\\ ‘ f n a • K 2 \ ' ( k\ • fc 2 \ ' 


Q 
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Q 2 


where we have extracted the normalization factor 

Nj = 2 5 a/ 2 (n a ■ n b ) a z b , 


(B.59) 


for simplicity, following the expression of Eq. (B.16) for the three point energy correlation 
function in the power counting for the emission of a single collinear particle. Furthermore, 
note that we have used Qj = z a Q in this expression. 

The integrals can be performed using standard techniques, and we find, after trans¬ 
forming to Laplace space, , for the jet function in the n a direction 
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(B.60) 


for gluon jets, and 


7(i) (n U Q )\ _ I a 3 2 Ja /( a ) 

J g,n a \Qji ®3 - e 2 ( 1 3 a ) + 2e - e (TCaj L “ l 63 
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(B.61) 
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a(l — a) 

7T 2 37r 2 (l — a) 1 13(1 — a) 

4a(l — a) 4a 2a 2a 


for quark jets respectively. The jet function for the n b direction can be trivially found from 
a —» b. 
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Here we have used La’ a denote the logarithm appearing in the jet functions. 

The argument of this logarithm depends on the subjet energy fraction. We indicate the 
specific logarithm for the subjet via the notation 


Li’ a (= log 


Nje^e 


( q ) d 7£ ( _ 


\y/2 Q 


Jl — OL 


(B.62) 


Collinear-Soft Function 


We now calculate the collinear-soft function. The collinear-soft modes couple eikonally to 
the collinear sector, and so the collinear-soft function has the one-loop form 

s 1 ? ( 4 0) ) s Si ( 4 °’). < B ’ 63 ) 

i¥=j 

where Tj is the color generator of leg i in the notation of Refs. [189, 190], and the sum runs 
over all pairs of legs. Since the collinear-soft modes resolves the dipole from the collinear 
splitting, there are three Wilson lines, n a ,rib,n to which the collinear-soft modes couple. 
We calculate separately the contributions arising from the pair of legs n a ,n bl and from the 
pairs n a ,b,n- I n both cases the integral involves the jet algorithm constraint. In the power 
counting of the collinear-soft modes, this constraint can be expanded as 

eftan 2 --?^) ->1. (B.64) 

\ 2 n ■ k) 

If this expansion was not performed, the contribution of the collinear soft modes sensitive 
to the jet radius R, would be removed by a soft zero bin subtraction. 


n a , n b Contribution: 

We begin by calculating the contribution from the emission between the n a , rib eikonal 
lines. The integrand is given by 

^ - (B.65) 
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(B.66) 


for simplicity, following the expression of Eq. (B.20) for the three point energy correlation 
function in the power counting for the emission of a single collinear-soft particle. 

To perform the calculation, we go to the light-cone basis defined by n, n. We then have 
n ■ n a _ . n ■ n a 

n a - k = —-— n ■ k -\ -— n ■ k + k± ■ n a ± 

= ^ a h • k + ^ 2 ^° n • k — (h • kn ■ fc) 1 / 2 |n a j_|cos 9 , (B.67) 

n-n b _ n-n b 

n b ■ k = —-—n ■ k H- —n ■ k + k± ■ n b ± 

= — n ■ k + ™ n ■ k + (h • kn ■ fe) 1 / 2 |n a j_|cos 9 , (B.68) 
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where 6 denotes the angle between the particle k and the n axis. In the above kinematic 
relations, we have made use of the fact that since h ~ h a + h b , k± ■ n b ± = — k± ■ n a j_. 
Rewriting the integrand for a positive energy gluon in terms of 9, we find 


[d d k\ + = 
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dn ■ k 


2 4 - 2 e 7r 2 -T(| - e) Jo 
r°° dn-k dn • k 
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for d = 4 — 2e. To simplify our expressions, we have extracted the following constant 


(B.69) 

(B.70) 
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2 4 -2e 7r |-T(i - e) ' 


(B.71) 


In the collinear soft region of phase space, we power count n a -n b <C 1. We can therefore 
work to leading power in n a -n b in the integrand. Using the relations of Eq. (B.7)- Eq. (B.9), 
and expanding to leading power in n a ■ n b , we have 

n a ■ kn b ■ k = (n ■ k + —n ■ k) — 71,1 nb ( n . kn-k) cos 2 0 . (B.72) 

V 8/2 


Note that in our power counting, n-k ~ n a -nb, so that this expression scales homogeneously. 
To perform the integral, we make the change of variables 


n ■ k = v, 


n ■ k = vw 


n a ■ n b 


(B.73) 


We then have 
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The one loop expression for the collinear soft function can then be written 
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(B.75) 

(B.76) 
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The v integral is straightforward. Transforming to Laplace space, e 3 
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The 9 integral can be performed exactly in terms of hypergeometric functions using 
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(1 - wf { ~ 1+ae) 2 -Fi 


T[l-e] 

which can be rewritten using a Pfaff transformation as 


1 — ae, 1/2 — e, 1 — e, — 


4u> 


sin- 2e 0 [(1 - wf + 4u; sin 2 9] 1+ “ e = 


\2\ —1/2—(1—a)e 


>^1 


x (1 + w)~ 1+2e ((1 -wf) 

The remaining integral in w is given by 

Vw& e ™ ( 4“ ) ) 2 (=*?r i+2 “y 


r[i - f] 

1/2 — e, —e + ae, 1 — e 


(1 — wf 

(B.79) 
4:W 


41n b (4 Q) ) = -5 2 r(-2e) 


47T4 1 “Q 2 


16c f 


(1 + wf \ 
F[l/2 - e]T[ 1/2] 


T[1 - e] 


r^r^^-wf) 

Jo w 


2\ —1/2—(1—a)e 




1/2 — e, —e + ae, 1 — e, 


4ru 


(1 + w) 2 J 

(B.80) 


Re-mapping the integral to the unit interval, we have 

Sd> (jW) - c 2 r( 2e) ( ' ;2 JV c S « tc ( 4°') 2 (=y )" 1+2 “ V le . EM - 

‘-’c,n a n b K e 3 ) 9 1 1 ^„/H-(ve02 iDCe PM X 


47T4 1 a Q 2 

\ 

£ dw(w~ e + w (1 ~ 2a) ^ (1 + w)~ 1+2€ {1 - 2 f x 


r [i - e] 

/ 

1/2 — e, —e + ae, 1 — e, 


4'u; 


(1 + re) 2 

We could not perform this integral exactly, but it can be done as a Laurent expansion in 
e by expanding the hypergeometric function as 


2^1 


1 /i \ i 4W 


= 1 — 2(1 — a)eln(l + w) + 0(e 2 ), (B.81) 


which is valid for 0 < w < 1, and we have truncated the expansion at 0 (e 2 ) as we are only 
interested in the terms up to O(e 0 ) in the one-loop result. We then have 

/ / x 2 iV 2 5 e^(ef ) ) 2 (M)- 1 + 2 a V. 


41n 6 (4 a) ) = Vr(-2e) 


T[l/2 — e]T[l/2] 

r [i - e] 


4-7T4 1 a Q 2 j 

J 1 dw(w~ e + w {1 ~ 2a) ^ (1 + w)~ 1+2e ( 1 - w )- 1 - 2(1 -“) £ (1 - 2(1 - a)e ln(l + re)) . 

(B.82) 

For the remaining integral in w, we have 

£ dw(w~ e + w (1 - 2o)e } (1 + w)~ 1+2e (l - w )-i-2(i-«) e (! _ 2(1 - a)e ln(l + w)) = 

dw(rv~ e + w (1 - 2o)e } (1 + w)~ 1+2e (l - w)~ 1 -2( 1 - a ) e (B.83) 

- 2(1 - a)e jf 1 du;(V e + (1 + u>)“ 1+2e (l - „,)-i-2(i-a) e l og (l + w ). 
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The first integral can be done in terms of hypergeometric functions, while the second can 
be done using plus functions (for a detailed discussion of their properties, see e.g. [145]), 
and applying the identity 


yl+ae 


1 . . (—ae) 1 , v 

— S(z) + — rr—Vi(z), 

ae ' i\ 


(B.84) 


with 


T>i(z) = 


i=0 


logV 


We find 
/•l 


dw(w^ + w^- 2a ^)((l + wY h+e 


lo 


+ 


r[2(a-l)e]r[l-e] 

T[1 - 3e + 2ae] 

T[2(a — l)e]T[l + e — 2ae] 

~ r[l-e] ' 


, s 2 \4~ (1_a)e 

l-w) ) 

2 -Pi [1 — 2e, 1 — e; 1 — 3e + 2ae; —1] 

2 Pi[l — 2e, 1 + e — 2ae; 1 — e; -1] 


(B.85) 

(1 — 2(1 — a)eln(l + w)) 

(B.86) 


+ 2 ie log2 - 2(1 - a)e log 2 2 - 


12 


1 ^ alog(2) + ^ e (—7r 2 a 2 + 36a 2 log 2 (2) + 37r 2 a — 24alog 2 (2) — 2tt 2 ) 


(2a — 2)e a — 1 
Therefore, in total, we have 


12(a — 1) 


h c,n a n b { e 3 ) 9 1 1 /t^/|l-a/02 ibCe Ffl _ 


,-l+2a\ 6 


47T4 1 “Q 2 


r [i - e] 


1 + alog(2) ^ ^ e (—7r 2 a 2 + 36a 2 log 2 (2) + 37r 2 a — 24alog 2 (2) — 27 t 2 ) 


(2a — 2)e a-1 


12(a - 1) 


(B.87) 


Expanding in e, and keeping only the divergent piece, as relevant for the anomalous di¬ 
mensions, we find 


c(l)div /~W\ = a s 1 2—— 

b c,n a n b ye 3 ) 

a., 1 


+ 2 


( 2alog(2) + log 

a* V 

kjVcse^Ce^K^" 6 )' 172 ^! 

2 1 -“Q 

7T 

(a - l)e 

a s L“ 


7T (a — l)e ’ 



- log(2) 


where 


= ^g 


/ liN C se- lE {e ( 3 0) ) ( n « ' n b) 

V V^Q 


—l/2+a' 


(B.88) 


(B.89) 
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n a ,n and n b ,n Contributions: 

We now calculate the n a ,n contribution to the collinear-soft function. The n b ,n contribu¬ 
tion will be identical. The one-loop integrand is given by 

(B.90) 
n b ■ k^ a//2N 


Sln a n(4 a) ) = 


9 


2 ( fi 2 e /E 
47r 


2 ri n ■ n 




n ■ k 
~ 2 Q 


1-0 / n a .^ a/2 


Q 


Q 


where we have again extracted the normalization factor 

N C s = 2 3a/2+1 z a z b (n a • n b ) 2 


(B.91) 


As with the n a ■ n b contribution, we expand the integrand to leading power in n a ■ n b 
using 

n a -n b _ \ 2 n a -n b; 


n a ■ k n b ■ k = yn ■ k + 

n a ■ n = 2 , 

n a -n b _ 


-n-k — 


- (n • kn ■ k) cos 2 9 , 


n n ■ k = 


n ■ k + n ■ k — (n • kn ■ k ) 1 ^ 2 cos 6 . 


To perform the integral, it is again convenient to make the change of variables 

'n a ■ n b ' 


We then have 


n ■ k = v, n ■ k = vw 


n a ■ kn b ■ k = v 2 [(1 — w ) 2 + 4w sin 2 6 \ , 


; n a -n b 

n a ■ k = —-—v + vw 


n a ■ n b 


— \ v w 


2 [n a -n b \\ V 2 jn a ■ n b 


cos 9 


= v 


n a ■ n b 


(l + w — 2y / u;cos0) 


(B.92) 

(B.93) 

(B.94) 

(B.95) 

(B.96) 

(B.97) 


The one-loop expression for the contribution to the collinear soft function can then be 
written 


41.(4 q) ) = 

~9 


(B.98) 


2 ( ^ 2 e 7£ 

47T 


4c f 


n a ■ n b 


dw 


dv 


1 0 Jo v 1+2e Jo 


dO sm~ 2e 9 


1 + w — 2y/w cos 9 


This integral can be performed in a similar manner to the n a ■ n b integral. The v integral 
is straightforward, after transforming to Laplace space eg —> , we find 


^ln(4 Q) ) = -5 2 4c e T(-2e) 




-1+2q\ e 


1,00 rhn /’ 7r 

— / d9 sin"* 

1 W e In 


47T4 1 a Q 2 J 

[(1 — w ) 2 + 4 w sin 2 #] QC 


1 + w — 2 sjw cos 9 


(B.99) 


- 94 - 



(B.100) 


We now focus on the integral 

— r d9 sin~ 2e 9 


00 dw „ (1 — w ) 2 + 4 w sin 2 9 ac 


Jo w Jo 

Remapping to the unit interval, we Hnd 
ri 


1 + w — 2 y/w cos 9 


/ 0 


du 


»i 


u 




2 fll 


9 [(1 — u) 2 + 4u sin 2 6 \ 

d9sm~ 2 t 9 [K 1 J 


10 


= du 

Jo 


u -e + u -^d-2a)e 


1 + u — 2y/ucos9 


x / c/0sin 2e 9 [(1 — u) 2 + 4ic sin 2 9] ac (1 + u + 2\/ucos 9) (B.101) 

Jo 


The 9 integral can be performed in terms of hypergeometric functions using 

^ d9 sin -2e 9 [(1 - u) 2 + 4 u sin 2 9] ~ l+ae = ~ 

Jo Id - e \ 


(B.102) 


x {1 + u)- 1+2t ((1 - u) 2 ) 


2\ —1/2—(1—a)e 


2*1 


1/2 — e, —e + ae, 1 — e, 


4 u 


and 


(1 + u) 2 J ’ 

f d9 sm~ 2e 9 [(1 — u) 2 + 4u sin 2 0] QC 1 cos0 = O, (B.103) 

Jo 


by symmetry. 

The hypergeometric function has the expansion 


2*1 


1 i-\ \ 1 4u 


= 1 - 2(1 - a)eln(l + it) + 0(e 2 ) , (B.104) 


which is valid for 0 < u < 1, 
The hnal u integral is then 

F[1/2 — e]T[l/2] f 1 


r[i - e] 


du 


u 


€ + u _i + (i_ 2Q)e j (1 + u)2e ((1 _ u) 2)-1/2-(1-^ 

x (1 — 2(1 — a)eln(l + u)) 


(B.105) 


We expect this integral to contribute both and ^ poles, unlike the n a rifc 

contribution, which are evident in the u —>• 1 and u — > 0 limits respectively. We need 
to do the integral to 0(e) to get the finite pieces, but only O(e 0 ) to get the anomalous 
dimensions, which is sufficient for now. We have 

_ T[1/2 — e]T[l/2] /•* _ e , n , „.\2e C/i „.\2\ — 1/2—(1 -a)e 


f duu € (1 + u) 2e ((1 — u) 2 ) 

Jo 


— 2(1 — a)e 


T[1 - 6] 

r[i/2 - e]r[i/2] r 1 duu - e ^ + u ^e ^ _ u ) 2 )- 1 / 2 -(i-^ log(1 + 
Jo 

—1/2—(1 -a)e 


F[1 ~ e] 


+ r[1/ p [x j^J 1/2] £ du u~ 1+(1 - 2a ^(l + u) 2t ((1 - u) 2 Y 
- 2(1 - a)e r t 1 / 2 ~ e ] r J 1 / 2 ] f 1 duu~ 1+yi ~ 2a ^(l + u) 2e ((1 - n) 2 ) _1/2_(1 ' a)£ log(l + u) 

Jo 


(B.106) 
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This integral can be done systematically using +-functions, but to the order we need the 
result, it is easier to use subtractions, evaluate the log at the value of the singularity, and 
then perform the integral in terms of hypergeometric functions. The integral can be written 


T[l/2 — e]T[l/2] f 1 


r[l - e] J o 


f duu e (l + u) 2e ((1 — u) 2 ) 

Jo 


-2(1 -a)e 


T[1/2 — e]T[l/2] f 1 


2\ —1/2—(1—a)e 

2\ —1/2—(1—a)e , 


(B.107) 


F[1 - e] 


[ duu~ e {l + u) 2e ((1-u) 2 ) 1/2 (1 " )e log(2) 

Jo 


+ r[1/2 . e]r | 1/2] f 1 duu~ 1+ ^ 2a ^( 1 + U ) 2e ((1 - u )2)-V2-(l-a)6 

r[l-e] Jo 

- 2(1 - a)e r[1/2 ~" ] ^J 1/2] £ duu~ 1+ ^- 2a >{ 1 + u) 2e [((1 - uf) 


—1/2—(1 ~a)e 


- 1 


l°g(2), 


which gives 

_ T[l/2 — e]T[l/2] T[1 — e]T[—2(1 — a)e] 
T[1 — e] T[1 — e — 2(1 — a)e] 


»Fi[-2e, 1 - e; 1 - e - 2(1 - a)e; -1] 


. r[l/2 — elr[l/2] 1 rfl — elr[—2(1 — a)e] _ r /xi 

- 2(1 - a)e 1 j 7 log(2) 1 .V 2 F 1 [-2e, 1 - e; 1 - e - 2(1 - a)e; -1] 


+ 


r[l-e] T[l-e-2(l-a)e 

T[l/2 - e]T[l/2] T[(l - 2a)e]r[—2(1 - a)e] 


T[1 — e] T[(l — 2a)e — 2(1 — a)e] 


2 .Fi[— 2e, (1 - 2a)e; (1 - 2a)e - 2(1 - a)e; -1] 


-2( 1 - a) T' 1 /- ! ]r [ i/2] log(2) 


( T, /(/// rrr (1 - 2a) e ; (1 - 2a)e - 2(1 - a)e; -1] 


V T[(l — 2a)e — 2(1 — a)e] 


T[(l - 2a)e] 
T[1 + (1 — 2a) e] 


2 F 1 [-2e,l;l + (l-2a)e;-l] . 


Expanding this to O(e 0 ) gives 


7r 


7r 


2 T log(2) + 4Ag(2) +2Tlc[g(2) 


We then have 


5 'San(®3 a) ) = -5' 2 4c £ T(-2e) 


(a — l)e (2a — l)e 2a — 1 a — 1 

(«) _1+2 “V 


(B.108) 


(B.109) 


7T 


47T4 1 a Q 2 
7T 


27rlog(2) 47rlog(2) 

(a — l)e (2a — l)e 2a — 1 a — 1 


+ 27rlog(2)J . 
(B.110) 
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Extracting just the divergent pieces so as to get the anomalous dimensions, we find 


41n(4 a) ) = 


O', a 


Oa 


log 


2'k(ol — l)e 2 27 t(2 a — l)e 2 

r„iV C 5e^(eW)(=^)- 1/a+ “ 

i°g 


+ 


nN cs etE(e { 2 } )( 


pilin' "n-"6 ’>-i/ 2+a 


2 1 -“Q 


a.. 


Of c 


7r(2a — l)e 


7r(a — l)e 


log( 2)«5 log( 2 )a s 


7r(a — l)e Tie 


(B.lll) 


which can be simplified to 


$c!Ln(4 a) ) = 


Oc 


2n(a — l)e 2 27r(2a — l)e 2 


+ 


CKc 


-L c ° e 


;(«) 


T CS ( ~(«T 

7r(2a — l)e Q V 3 ^ ’ 


(B.112) 


7r(a — l)e “ V 3 

where, as for the logarithm in the n a rib contribution, Eq. (B.89), the logarithm that appears 


is 


L a (4 a) ) = log 


VAT C ge^(4 a) )(n a -n fc )- 1/2+a ' 

V2Q 


(B.113) 


The contribution from an emission between the rib and n Wilson lines is identical, so 
we have 


o(l) 

n^n 



cW 

°c, n a n 



(B.114) 


Note that for both the fin a and nrib contributions, and unlike for the n a rib contri¬ 
bution, we have 1 /e contributions both of the soft form 1/(1 — 2 a), and of the collinear 
form, 1/(1 — a). This will be crucial to achieve the cancellation of anomalous dimensions, 
as required for the consistency of the collinear subjets factorization theorem. 

It is interesting to note that this structure is very different than that which appeared 
for the case of the IV-subjettiness observable in Ref. [77]. In this case only a single angular 
exponent appears throughout the calculation, unlike both the 1/(1 — 2 a) and 1/(1 — a) 
that we find here, and the divergent pieces of the n n a and n rib contributions vanish. 


Cancellation of Anomalous Dimensions 


We now review the renormalization group evolution of each of the functions in the factor¬ 
ization theorem, and show that sum of the anomalous dimensions vanishes, as required for 
renormalization group consistency. 

The hard function satisfies a multiplicative RGE, given by 


where 



log H{Q 2 ,h) = 7 h(Q 2 ,v) = 2 Re [ 7 c{Q 2 ,^)\ , 


7 c{Q 2 ^) 


ch s Cf 

4-7T 



(B.115) 


(B.116) 
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is the anomalous dimension of the dijet Wilson coefficient. Explicitly 


7 H(Q 2 ,fJ-) = 


o. s Cf 
2t r 


41og 


91 

m 2 


- 6 


(B.117) 


The anomalous dimension of the hard splitting function H 2 can be extracted from 
Ref. [77] by performing a change of variables. It satisfies a multiplicative RGE 


(t,x,n) = 7 H 2 (t,x,n)H 2 (t,x,n ), 


with anomalous dimension 

otsin) 


7H 2 (t,X,fi) = 


27T 




2CAlog^2 + 4 ( C F - — ) logx + 2CAlog(l - x) - A) 


Here A) is defined with the normalization 


11 C A 2 n f 

* = - 3 —r • 


(B.118) 


. (B.119) 


(B.120) 


Converting to by performing the change of variables given in Eq. (B.35), we find 

(B.121) 


7h 2 [e^\z q ,nj = 


a s (n) 


( Q 2 {z a z b ) 1 2 /“ (e^ Q) ) 7 \ 

2 C*a log 

2-k 

H 2 4 





+4 ( C F - ) log z q + 2CAlog(l - z q ) - /3 0 


Since the anomalous dimensions of the jet, soft and collinear-soft functions are written in 
terms of 63 , z a , Zb , and n a ■ rib , for demonstrating cancellation of anomalous dimensions, 
it is convenient to replace in Eq. (B.121) with its leading power expression from 
Eq. (B.13). We then have 


7h 2 




(Xs(m) 

2 -k 


2C A \og 



ZqZ b n a -n b \ 
2 ) 


+4 




log Z q + 2CAlog(l 


(B.122) 

Zq) A) 


Note that 1 — z q = z g . 

The jet functions satisfy multiplicative RGEs in Laplace space (they satisfy convolu¬ 
tional RGEs in , see Ref. [110] for a detailed discussion) 

/^ lo g*4,gn (Qj,4 a) ) = 7 g, q (<5j,4 Q) ) > (B.123) 

where the one-loop anomalous dimension is determined from Eqs. (B.60) and (B.61), and 
is given by 


7g,q 



n as Cg,q 
7 r (1 — a) 


Li’ a 



+ 7 g,q > 


(B.124) 
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where the logarithm La’ a ^4°^) was defined in Eq. (B.62), and is given by 


Li 


J,a _ 


= log 




(B.125) 


Here C g ^ q is the appropriate Casimir {Ca for gluon jets and Cf for quark jets), and with 
7 g tq the standard functions 


7g = 


3 ot s C'F 
2vr 


7 g = 


a s HCa — 2ny 


7T 


6 


(B.126) 


For subjet b, we simply have a —> b. 

Similarly, the soft function satisfies a multiplicative RGE in Laplace space 


d 


"^ logSG k!°’) =7G b 

with one-loop anomalous dimension determined by Eq. (B.50), and given by 




7G (4 a) ) = 


-2a, 


7r(l — 2a) 


T n • T n L a ( e 3 


G ( ~fo) 


(B.127) 


(B.128) 


Here the logarithm is given by 


L a (4 0) ) = lo g 


e lE Ns 
2 1 ~ a Q 


(1 - 2a) 


fog 


tan^ 


,R 


(B.129) 


Finally, the collinear soft function satisfies a multiplicative RGE in Laplace space 


d 


^ log m^’j=^ b 

with the one-loop anomalous dimension determined by Eqs. (B.88) and (B.lll) 


;(«) 


(B.130) 


7cs (4 Q) ) = T a • T blab (4 a) ) + T a ■ T fi7afi (4 Q) ) + Tn • T blnb (4 a) ) , (B.131) 


where 


lab I e ( 3 a) ) = 


W\ _ 


—4 a 


s L c l(~e {a) 


7r(l — a) 


a L3 I ■ 


lan l 63 ~ Ibn l 63 J ~ 


(«)\ _ 2a s ( ( a ) 


7r(l — a) 


L c : eT) + 


2a, 


7r(l — 2a) 


Li 


(4”>) • 


(B.132) 

(B.133) 


The argument of the logarithm appearing in the collinear soft function, was defined in 
Eq. (B.89), and is given by 


T cs _ , r ( ^Ncse lE i4 a) ) ( n « ' n b ) 1/2+ “ 
“ “ g [ 


(B.134) 


We can now explicitly check the cancellation of anomalous dimensions. We consider 
the particular partonic subprocess e + e^ —> qq —> qqg for which we have explicitly given 
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the hard splitting function, in which case the color algebra can be simplified and written 
entirely in terms of Casimirs using the color conservation relations 


We then have 


T n — Tg + T g , 

(B.135) 

T n + T n = 0 . 

(B.136) 

T n • T n = — Cf , 

(B.137) 

rp rp A 

q ‘ 9 - 

(B.138) 

rp rp C'a rq 

*-q ■ J-n — Lf , 

(B.139) 

rp rp f 1 

(B.140) 

T 9 • T 9 = Ca , 

(B.141) 

T n • T n = Cf ■ 

(B.142) 


However, for most of the cancellation of the anomalous dimensions, it will be convenient 
to work in the abstract color notation, so as not to need to use relations between the color 
Casimirs. 

The independence of the total cross section under renormalization group evolution 
implies the following relation between anomalous dimensions 


ih (< 5 2 , n) + ih 2 





+ 7 G 



+ 7 cs 



~ 0 , 
(B.143) 


where the ~ means up to a term corresponding to the measurement of the jet in the h 
direction, and the out-of-jet contribution to the soft function, which is independent of the 
e 3 ^ measurement, and the kinematics of the substructure, namely n a ■ rib, z a , and Zb- We 
will make this relation precise shortly. 

We now show explicitly that this cancellation occurs, and how it arises, which provides 
a non-trivial cross-check on the collinear-subjets factorization theorem. Substituting in the 
expressions above, we find 


^2 = 1 H (Q' 2 ,2 + 1h 2 ( 4 a \z q , 


+ 


4 a,L 


-T„ • T,. 


s^ a 


;(<*) 


7r(l — a) 


—2T •T- 

Z ' i a - 1 - n 


T • T- 

■*-n -*-71 


a s L™ 


;(«) 


Oi s L a 


7r(l — a) 

(4“’) 

7r(l — 2a) 



- 2 T b ■ T n 


a s L c a s 


;(«) 


2 a.L 3 n ( e. 


2a ,Ll | e 


(a) 

- Ca - n v V + 7s - Cf - 

7r(l — a) 7r(l — a) 


a.,L' 


s^ a 




7r(l — a) 


7r(l — 2a) 


+ 7g- (B.144) 
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To make manifest the separate cancellations, we use the color conservation relation T n = 
T a + Tfe in the soft anomalous dimension, and = — T a — Tf, in the 1/(1 — a) pieces of 
the collinear soft anomalous dimensions. Grouping together collinear like terms (1/(1 — a)) 
and soft like terms (1/(1 — 2a)), we then have 


'^2 = 'yH (Q 2 ,v) + 7 h 2 ( 4 Q) > 

7 

2 a s L% (ef } ) 


— (T a + Tfe) • Tjj 


-C A 



7r(l — 2a) 


T T - 

a - 1 - n 


2 a s L“ 


;(«) 


7r(l — 2a) 


+ Tb • 


2 a s L c a s 


- Tn T 


T a • (—T a - T 6 


2 a s L c a s 




7r(l — a) 


+ T b .(-T a -T b ) 


‘2,Oi s lP OL I Og 


2 CX s L a I Cg 


(a) 


T\ - 7 - 1 " 7 g - Cf - 7Z -\- 1 - 7 q ■ 

7r( 1 — a) 7r(l — a) 



Since all the logs are linear in the eg , we immediately see that the color conservation rela¬ 
tions have led to the cancellation of the dependence in the soft like pieces between the 
nn b and nn a contributions to the collinear soft function with the global soft contribution, 
and the cancellation between the collinear like pieces involve all three contributions to the 
collinear soft function, as well as the jet functions. This nontrivial cancellation supports 
the validity of the collinear subjets factorization theorem. 

It is also straightforward to check that the dependence on as well as on the jet 
energy fractions also cancels, although this is more tedious to perform step by step. We 
therefore simply quote the summed result of the anomalous dimensions, to make clear the 
meaning of the equivalence relation in Eq. (B.143). We have 


7 H {Q 2 ,^) + 1h 2 (e^\z qi /i) + 7 “ (ef } ) + 7 “ (4°°) + 1G (4°°) + 7es (& 

2 

3 a s Cp a s Cp log [tan 2 f ] ol s C f log ^2 


(«) 

3 


2 vr 


7T 


7T 


(B.146) 


These remaining terms are exactly those expected to cancel against the out-of-jet contri¬ 
bution; see, e.g., Ref. [110] for a detailed discussion. 

The out-of-jet jet function is then given by the unmeasured jet function of Ref. [110] 


d i t t d '1 2a s Gi? 

fi—In J oj {R B ) = -log 

a/i 7 r 




Q tan 


+ 


3 u s Cf 
2 t r 


(B.147) 


where here Rb is the radius of the recoiling jet. For simplicity, throughout this paper, we 
have taken Rb = R. 

The out-of-jet contribution to the soft function has a pure cusp anomalous dimension 

[ 110 ] 
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C One Loop Calculations of Soft Subjet Functions 

In this appendix we give the operator definitions and one-loop results for the functions 
appearing in the factorization theorem of Eq. (3.17) for the soft subjet region of phase 
space. The factorization theorem in the soft subjet region of phase space was first presented 
in Ref. [76], where all functions were calculated to one-loop, and a detailed discussion 
of the structure of the required zero bin subtractions was given. This calculation was 
performed with a broadening axis cone algorithm, however it was argued in Sec. 3.1.2 that 
to leading power, the factorization theorem is identical in the case of an anti -kx algorithm. 
Because of this, in this appendix we give only the final results for the one-loop anomalous 
dimensions, and the tree level matching for the soft subjet production, as are required for 
the resununation considered in this paper. The interested reader is referred to Ref. [76] for 
the detailed calculation, as well as a discussion of the intricate zero bin structure of the 
factorization theorem, which is only briefly mentioned in this appendix. 

Definitions of Factorized Functions 

The functions appearing in the soft subjet factorization theorem of Eq. (3.17) have the 
following SCET operator definitions: 

• Hard Matching Coefficient for Dijet Production 

H{Q\h) = \C(Q\li)\\ (c.i) 

where C (Q 2 , /r) is the Wilson coefficient obtained from matching the full theory QCD 
current onto the SCET dijet operator XrJ'Xn 

(qqW’Tm = C ( Q 2 , ») (qq\O 2 \0) . (C.2) 

As before, we have neglected the contraction with the Leptonic tensor. 

• Soft Subjet Jet Function: 

Jn sj (4“ } ) = (C.3) 

^tv(0\B^(0)Qo(B)6(Q S j - n sr V)6^\V ±SJ )6^ - Q fj E 3 ^\ sj ) B^C0)|0) 

• Jet Function: 

J n(ei a) ) = ^tr(0\^x n (0)e o (B)6(Q-n-r)6^\V ± )6(e^-Q F jE 3 ^\ HJ )xn(0m (C.4) 

• Boundary Soft Function: 

Sn sj n sj (4 a) ; R) = ^mS nsj S nsj }@o(B)6(e^ - e F jE 3 ^\ BS )f{S ntj S n J\0) 

(C.5) 
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• Soft Subjet Soft Function: 

Sn sj nn(et\B-,R) = tr(0| T{S nsj S n S fi }eo(B)d{e^ ) - 0FjE 3 ( a >| g )r{S naj .S n S fi }|O) 

(C.6) 

The definitions of these functions include measurement operators, which when acting 
on the final state, return the value of a given observable. The operator E 3 (°9 measures the 
contribution to from final states, and must be appropriately expanded following the 
power counting of the sector on which it acts. Expressions for the expansions in the power 
counting of the different sectors will be given shortly, after kinematic notation has been set 
up. The operators @fj, and 0 o constrain the measured radiation to be in the jet or out 
of the jet, respectively, and will be defined shortly. 

Kinematics and Notation 

For our general kinematic setup, we will denote by Q the center of mass energy of the e + e - 
collisions, so that Q/2 is the energy deposited in a hemisphere, i.e. the four-momenta of 
the two hemispheres are 

, _(Q_ * 

? Pi J ? Phemisphere 2 — \ 2 5 ^ 

SO 

s = Q 2 . (C.8) 

We are now interested in the regime where there is a wide angle soft subjet carrying a 
small energy fraction, and an energetic subjet, carrying the majority of the energy fraction. 
We will label the lightcone directions of the energetic subjet by n, h, and the lightcone 
directions of the soft subjet as n s j,n s j. We will use the variable z s j to label the energy 
fraction of the soft subjet, namely 

E s j = z S] — , z s j < 1. (C-9) 

In this region of phase space, to leading power the value of the two point energy 
correlation function is set by the two subjets, and is given by 

e W = 2 “/2 Zsj [n ■ n sj ) a/2 . (C.10) 

The action of the measurement function E 3 ( “ } on a arbitrary state for each of the 
factorized sectors contributing to the 3-point energy correlation function measurement is 


(C.7) 
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given by 
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ki,kj £Xhj 
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sj / > 


Q Q \n ■ kin ■ k 


X, 


hj / ) 


= E Xbs~ 

k£X bs 


n s j ■ k (n s j ■ k\ 2 


Q \n s j ■ k 


X bs ), 




k£X s 


Q V k° k° 


X « 


where, for simplicity, we have extracted the normalization factors 


Nsj = 2 5a/2 (n • n SJ -)“ , = 2 5oi/2 Zsj (n ■ n sj ) a , 

= 2 2a z sj (n ■ n s j) a , iVs = 2 1+3a/2 z sj (n ■ n sj ) a/2 . 


(C.ll) 

(C.12) 

(C.13) 

(C.14) 

(C.15) 

(C.16) 


These expressions follow from properly expanding the definition of the energy correla¬ 
tion function measurements in the power counting of each of the sectors. Note that on the 
jet sectors, the 3-point correlation measurement becomes an effective 2-point correlation 
measurement, since the 2-point energy correlation function is set by the initial splitting of 
the subjet. 

The in-jet restriction, &fj, is given by 

e„{*) = 0 (tan 2 f - | Xj. ( C ,i 7 ) 

The jet restriction must also be expanded following the power counting of the given sector. 
We will see that this is actually quite subtle for the soft subjet modes, since the angle 
between the soft subjet axis and the boundary of the jet has a non-trivial power counting. 
In particular, the expansion of 0^ j(k) is different for the soft subjet jet and boundary soft 
modes, and will demonstrate the necessity of performing the complete factorization of the 
soft subjet dynamics into jet and boundary soft modes. Finally, since we are considering 
the case where the out-of-jet scale B is much less than the in-jet scale, the operator 


@o(B) 

must also be included in the definition of the soft sub jet functions. This operators vetoes 
out-of-jet radiation above the scale B. The explicit expression for &o{B) expanded in the 
power counting of each of the factorized sectors can be found in Ref. [76]. 


Hard Matching Coefficient for Dijet Production 

The hard matching coefficient for dijet production, H(Q 2 ,fi), is identical to that for the 
collinear subjets factorization theorem by hard-collinear-soft factorization, and is given in 
Eq. (B.30). 
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Hard Matching for Soft Subjet Production 

The hard matching coefficient H SJ ( z s j , 6 s j) is determined by the finite parts of the logarithm 
of the soft matrix element for a single soft state 

H 9j (z aj ,n aj ) = tT(0\T{S n S ii }\8j)(8j\f{S n S ii }\0) Bn . (C.18) 


The virtual corrections of the effective theory cancel the IR divergences of this matrix 
element, giving a finite matching coefficient. This matrix element can be calculated from 
the square of the soft gluon current [194, 195], which is known to two loop order [196, 197]. 
The tree level hard matching coefficient for the soft subjet production is given by 


rrsjj, tree) 
n nn 


Os 


(x s Cf n • n 

7T Zsj Tt * Tl s j nsj ’ Tb 


(C.19) 


The results of Ref. [195] can be used to determine the soft subjet production matching 
from an arbitrary number of hard jets at one loop. 


Anomalous Dimensions 


In this section we collect the one-loop anomalous dimensions for all the functions calculated 
in this appendix. The two hard functions satisfy multiplicative renormalization group 
equations. For the dijet production hard function, we have 


H(Q 2 ^) = 7 h(Q 2 ,v) = 2Re ['yc(Q 2 ,^)] ■ 


Explicitly 


7h(Q 2 ,^) = 


a s Cp 
2 t r 


41og 


91 

m 2 


- 6 


For the soft subjet production hard function, we have 

2 ii 2 n ■ n 


V In (z sj , n sj , n) = - -IjZ- In 


Q‘ 2 z 2 SJ n ■ n s j n s j ■ n 


->■ 


(C.20) 


(C.21) 


(C.22) 


The jet, boundary soft, and global soft functions satisfy multiplicative renormalization 
group equations in Laplace space, where the Laplace conjugate variable to will be 
denoted e^ a \ 

The jet function for the soft subjet satisfies the RGE 


d 1 T 
fi— In ,J r 

a/i 


~(a) 
, Co 
S3 V O 


= -4 


o-sCa 

27 t (1 — a) 


log 


?“ a / 2 p (")plE r 2 


"sj 


/I 


'SJ 


Q 


N S j 


Of c 


+ ^/3 0 , (C.23) 


where the normalization factor N$j was defined in Eq. (C.15). We have assumed that the 
soft subjet is a gluon jet, as it is this case that exhibits the soft singularity of QCD. 

The jet function for the hard subjet, which we have assumed to be a quark jet, satisfies 
the RGE 
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CX-sCf 
27 t (1 — a) 


log 


j—ol/ 2 ~{q) 7 b 



+ 


3 cx s C f 
2 tt 


(C.24) 
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where the normalization factor Njjj was defined in Eq. (C.15). 

Since the soft subjet factorization theorem is sensitive to the boundary of the jet, it is 
also necessary to include out-of-jet contributions. We assume that nothing is measured on 
the recoiling jet. The out-of-jet jet function is then given by the unmeasured jet function 
of Ref. [110] 


d , T , _ , 2 a. s Cp 

li—In J oj (R B ) = -log 

a/j, 7r 




Q tan ^b. 


+ 


3 oi s Cp 
2 tt 


(C.25) 


where here Rb is the radius of the recoiling jet. For simplicity, throughout this paper, we 
have taken Rb = R. 

The boundary soft function, satisfies the RGE 


V n5 ^- 
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log 
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SJ 


4 R 
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— ( 1 —a) 
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n ■ n 


SJ 


R 


n ■ n s j tan 2 2 


(C.26) 


where the normalization factor Nbs was defined in Eq. (C.15). 

For the soft function, it is necessary to perform a refactorization into in-jet and out-of- 
jet contributions along the lines of Ref. [110]. This is particularly important in the present 
case, since as was discussed in detail in Ref. [76 , the out-of-jet contribution to the soft 
function is sensitive to the large logarithm, log tan 2 dd — tan 2 , but due to zero bin 
subtractions, the in-jet contribution to the soft function does not exhibit such a sensitivity. 

The in-jet anomalous dimension has both Ca and Cp contributions. It is given by 
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log[T] + —4^-log 


2 C 


7r 
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log 


tan 2 dd 


(n • n s j) 2 tan 2 


2 tan dd 
tan 4 1 


(C.27) 


(C.28) 


where in the first equality we have separated the contributions from a gluon between the 
three different Wilson lines, and to simplify the expression we have extracted the argument 
of the logs 


-(a) 


T = e lE N s - 




Q t&n 1 ~ a -¥- 
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sj 
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(C.29) 
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We choose the canonical scale for the in-jet soft function by minimizing the arguments of 
the Ca log. Namely, we rewrite the anomalous dimension as 


7 g J C A_+2 c F)a a ]og 


7r(l — a) 


tan §■ 


2 (1—0:) 
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2 Cfol s 
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' tan °f ' 
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1 — Ct 
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(C.30) 


The argument of the second logarithm is formally an 0(1 ) number in the soft subjet region 
of phase space, and is treated as the non-cusp anomalous dimension. The argument of the 
first logarithm is used to set the scale. 

The out-of-jet anomalous dimension is purely non-cusp, and is given by 
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The natural scale for the out-of-jet soft function is 

A^out = 


(C.31) 


tan 2 y 

o-sCa , 

1 

tan 2 — tan 2 ^ 
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(C.32) 


where B is the out-of-jet scale. We set B = Q as discussed in Sec. 3.1.2. 

For consistency of our soft subjet factorization theorem, the sum of the anomalous 
dimensions listed above should cancel. Indeed, one can explicitly check that the anomalous 
dimensions satisfy the consistency condition 


» J- ln #(Q 2 , m) + n-^- In H^(z sj ,n sj ,ii) + /i^- In J Uaj (4°°) + In J hj (4°°) 
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(C.33) 


This cancellation is highly non-trivial, involving intricate cancellations between a large 
number of scales, providing support for the structure of our factorization theorem at the 
one-loop level. Some further details on the structure of the cancellations, particularly on 
the dependence of the angle between the soft subjet axis and the boundary, are discussed 
in Ref. [76]. 


D Soft Subjet Collinear Zero Bin 

In this appendix we summarize the one-loop anomalous dimensions, and required tree level 
matrix elements for the calculation of the collinear zero bin of the soft sub jet factorization 
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theorem, which is required to interpolate between the collinear subjets and soft subjets 
factorization theorem. Although all the ingredients in this appendix can be obtained 
straightforwardly from App. C using the standard zero bin procedure [119], we explicitly 
summarize the results here for completeness. 

To perform the zero-bin, all anomalous dimensions and matrix elements of the soft 
subjet factorization theorem are written in terms of e ^ and z s j , and then the limit 

(a) 
z sj 


(D.l) 


is taken. We will therefore write the anomalous dimensions and matrix elements in this 
section in terms of e z s j , and eJf 1 . To keep the notation as simple as possible, we will 
use only a tilde to denote a collinear zero binned matrix element or anomalous dimension, 


(in) 

e -g- 7 GS 


-(in) 

Tgs- 


Hard Matching for Soft Subjet Production 

The collinear binned hard matching coefficient for soft subjet production is given at tree 
level by 


fVsi'(tree) / 

Hnn J (z: 


(a)\ ^sC'F 2 
sj > e 2 ) — 


7r a 
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%sj ^2 


(D.2) 


Anomalous Dimensions 


Since the renormalization group evolution of all functions in the zero bin is identical to in 
the soft subjet factorization theorem, here we simply list the results for the zero binned 
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one-loop anomalous dimensions: 
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(D.4) 
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(D. 6 ) 

(D.7) 

(D. 8 ) 


(D.9) 
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As for the soft subjet anomalous dimensions, one can check that the zero binned 
anomalous dimensions satisfy the consistency relation 
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as required for the consistency of the factorization theorem. 

E One Loop Calculations of Signal Factorization Theorem 

In this section we give the operator definitions, and one-loop results for the functions ap¬ 
pearing in the factorization theorem of Eq. (3.35) for the signal contribution from Z —> qq. 
These are formulated in the SCET + effective theory of Ref. [77], in an attempt to have a 
consistent approach to factorization for both the signal and background distributions. In 
the collinear sub jets region of phase space the two are identical (including identical power 
counting for the modes) up to the absence of global soft modes for the signal distribu¬ 
tion. Alternatively, the factorization theorem for the signal region can be formulated by 
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boosting the factorization theorems for appropriately chosen e + e _ event shapes, as was 
considered in Ref. [41]. While this approach is less in the spirit of developing effective field 
theory descriptions for jet substructure that was pursued in this paper, it has the potential 
advantage of being easily able to relate to higher order known results for event shapes. 

Definitions of Factorized Functions 

The functions appearing in the collinear subjets factorization theorem of Eq. (3.8) have 
the following SCET operator definitions: 

• Hard Matching Coefficient: 

Hz{Q 2 ) = \C z (Q 2 )\ 2 , (E.l) 

where Cz ( Q 2 ) is the matrix element for the process e + e _ — > ZZ, and also includes 
the leptonic decay of one of the Z bosons. Since we use the narrow width approxi¬ 
mation, flat polarization distributions for the Z, and normalize our distributions to 
unity, it will play no role in our calculation. 

• Jet Functions: 

Jn a , b ( 4 Q) )= (E.2) 

^tr<0|^Xn Oii (0)<S(Q - fi a fi • V)6^(V ± )5^ - E 3 ^)xn a , 6 (0)|0) 

• Collinear-Soft Function: 

Sc,n a n b (ei a) ) = tr<0|T{,S na S n Jj(e^ - E s ^)f{S na S nb }\0) (E.3) 

As in App. B and App. C, the operator, E 3 ^ a \ measures the contribution to e 3 °^ from final 
states, and must be appropriately expanded following the power counting of the sector on 
which it acts. Since the power counting is identical as for the collinear subjets factorization 
theorem, the expansions are given in Eq. (B.16), and Eq. (B.20). In the collinear subjets 
region that we consider for the signal, all modes are boosted, and so there is no dependence 
on the jet algorithm at leading power. 

Hard Matching Coefficient 

The hard matching coefficient for the process e + e _ —> ZZ, with one Z decaying leptonically, 
Hz{Q 2 )-> does not carry an SCET anomalous dimension (hence we have dropped the n 
dependence), as it is colorless. Because we work in the narrow width approximation, at 
a fixed Q 2 , and consider only normalized distributions, it is therefore irrelevant to our 
discussion. 


- 110 - 



Matrix Element for Z —>• qq Decay 

The anomalous dimension for the Z —» qq splitting function appearing in the factorization 
theorem of Eq. (3.8) is the same as that for the SCET quark bilinear operator, which was 
given in Eq. (B.33), but evaluated at the appropriately boosted scale. 

For simplicity, in this paper we do not account for spin correlations, and assume a flat 
profile in the polarization of the Z boson. The tree level Z —>• qq matrix element is well 
known and first calculated in Ref. [198]. The full matrix element is known to two loops 
[199]. 

The anomalous dimension depends only on the color structure, and is therefore the 
same as the anomalous dimension for the hard matrix element for e + e~ —> qq , namely 


7 H z 


1 + 


O-sCf 

2vr 
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+ 31og 





L w J 


(E.4) 


Here fin is the scale of the splitting. It is essential for the cancellation of anomalous 
dimensions that the scale [in is equal to the invariant mass of the jet. In terms of the 
energy correlation functions, this is given by 


m 2 j = 


Q 2 [ Z (l-z)] 1 - 2 /“(e<“ , ) 2/ “ 


Q 2 z(l - z)n a ■ n b 


(E.5) 


The necessity for the appearance of the jet mass as the scale in the anomalous dimension 
is due to the fact that it is a Lorentz invariant quantity, and as has been discussed in 
Ref. [41], the factorization theorem for the case of the boosted boson can be obtained by 
boosting an e + e - event shape, where it is of course known that the scale Q 2 of the off-shell 
Z, or 7 is the scale appearing in the hard anomalous dimension. 


Jet Functions 

The jet functions appearing in the signal factorization theorem are identical to the quark 
(and antiquark) jet functions calculated in App. B for the collinear subjets region of phase 
space. This is because the power counting is identical in the two cases and the jet functions 
are only sensitive to the color of the jet that they describe. Therefore we do not repeat 
them here. 


Collinear-Soft Function 

The power counting for the signal is identical to the power counting for the collinear 
subjet region for the QCD background. However, the collinear-soft function contains only 
Wilson lines along the collinear subjet directions. The collinear-soft function for the QCD 
background was calculated in pairs of dipoles in App. B, and therefore the contribution 
from a collinear-soft exchange between the n a and rib Wilson lines can simply be extracted 
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from that calculation. The result for this contribution is given by 
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where we recall that the normalization factor is given by 

N C s = 2 3a/2+1 z a z b (n a ■ n b ) a/2 , 
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as defined in Eq. (B.66). Also note that we have factored out the color generators, so that 
the collinear-soft function is defined as 

Si 1] (4“’) = \ E Ti ■ TjS™ (4“’) ■ (E.8) 

#3 


which is the generic form of the collinear-soft (or soft) function to one-loop. 

Expanding in e, and keeping only the divergent piece, as relevant for the anomalous 
dimensions, we find 
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rcs _ , (vNcse^i^) (n a • 
a ~ g { Tig 
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Since there is no global-soft function the cancellation of anomalous dimensions, to be 
discussed shortly, requires that only 1/(1 — a) contributions appear in the collinear soft 
function, as is observed. 


Cancellation of Anomalous Dimensions 

It is also interesting to explicitly check the cancellation of anomalous dimensions for the 
signal factorization theorem as formulated in SCET + to further confirm the cancellation 
mechanism which took place for the background distribution. The functions appearing 
in the signal factorization theorem obey identical evolution equations to those for the 
background distribution, which were explicitly given in App. B, so we do not repeat them 
here. 

The independence of the total cross section under renormalization group evolution 
implies the following relation between anomalous dimensions 

7 Hz + 7 q (4 a) ) + iq (4 Q) ) + 7M (4 a) ) = 0 • ( E - U ) 
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Here 7 h z is the anomalous dimension of the Z —>• qq matrix element, 7 " ^(4°^ and 

7 ? ^ 63 °^ are the anomalous dimensions of the quark and antiquark jet functions and 

7 cs J is the anomalous dimension of the collinear soft function. 

For the case of Z —> qq, we have the color conservation relation 

T<? + Tg = 0 . (E.12) 

The explicit values of the relevant Casimirs are 


Tg • T q = C F , Tq-Tq = CF, Tq-Tq = ~CF, (E.13) 


however, for most of the cancellation of the anomalous dimensions, it will be convenient to 
work in the abstract color notation. 

Substituting the explicit expressions for the anomalous dimensions into the consistency 
relation of Eq. (E.ll), we find 
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(E.14) 


where La and La ^e 3 “^, were defined in Eq. (B.124). 

As expected, all contributions are collinear in nature, having a 1/(1 — a) dependence, 
and using the color conservation relation of Eq. (E.12) along with the explicit expressions 
for the Casimirs of Eq. (E.14), we immediately see the cancellation of the e 3 dependence. 
It is also straightforward to check the cancellation of the remaining dependencies. It is a 
nice consistency check on the calculation that the cancellation occurs in exactly the same 
way as for the background cancellation, namely between the T y • T q contribution and the 
jet functions. It is important to emphasize that the cancellation only occurs if the scale 
of the splitting is given by the invariant mass of the jet, as expected from boosting e + e - 
event shapes. 


F Soft Haze Factorization Theorem 

For completeness, we list the operator definitions of the functions appearing in the soft 
haze factorization theorems. We also give the explicit forms of the measurement operators 
expanded in the appropriate kinematics. 

The quark jet functions are given as: 

= 

^tr(0|^Xn o , 6 (0 )5{Q - n a)b ■ V)6^(V ± )s{e^ - E 2 ^)xn a , 6 (0)|0> . (F.l) 
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The gluon jet functions are similarly defined. The soft functions appearing in the factor¬ 
ization theorems (3.29) and (3.30) are: 


= ^-tr(0|T{5 n 5 fi }«j(e? ) - -0 fi E 2 W) 

- @R^3 {a) )T{S n Sn}\0 ), (F.2) 

S n n(ei a) ,e^;E) = ^-tr(0|T{S n Sn}s(e ( 2 a) - e R E 2 (o) ) s(e ( 3 a) ’ - Q R E 3 ^f{S n Sn}\0) . 

(F.3) 

The action of the energy correlation functions on the collinear and soft haze states are 
given as: 
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(F.4) 

(F.5) 

(F.6) 


G Summary of Canonical Scales 

As many of our factorization theorems involve a large number of scales, in this section we 
summarize for convenience the scales used in the resummation. Unless otherwise indicated, 
all scales are taken to be the canonical scales of the logarithms appearing in the factorization 
theorems. 

When performing the numerical resummation, we perform the renormalization group 
evolution in Laplace space, and compute the cumulative distribution. We then perform 
the scale setting at the level of the cumulative distribution and numerically differentiate 
to derive the differential D 2 spectrum. While this is formally equivalent to scale setting in 
the differential distribution when working to all orders in perturbation theory, differences 
between scale setting in the differential and cumulative distribution arise when working to 
fixed order in perturbation theory [200] . We have not investigated the size of the effect that 
this has on our D 2 distributions. We utilized only two loop running of a s , to be consistent 
with the Monte Carlos, and avoided the Landau pole by freezing out the running coupling 
at a specific ^Landau ~ 1 GeV. 

Throughout this appendix we will use z q and z g to denote the energy fractions of the 
quark and gluon subjets, respectively. For simplicity, we restrict to the case a = (3. Finally, 
we estimate the soft out-of-jet radiation scale to be: 

5«q(4 q) ) 2 (G.l) 

This is consistent with the jet algorithm constraint given by Eq. (3.16). 
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Collinear Subjets 

We take the canonical scales for the functions appearing in the collinear subjets factoriza¬ 
tion theorem as 



(G.2) 

(G.3) 

(G.4) 

(G.5) 

(G.6) 

(G.7) 

(G.8) 


where the scales are indexed by the name of the associated function in the factorization 
theorem. 
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Soft Subjets 

We take the canonical scales for the functions appearing in the soft subjets factorization 
theorem as 


Hh = Q, 
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Soft Subjet Collinear Zero Bin 

We take the canonical scales for the functions appearing in the collinear zero bin of the 
soft subjets factorization theorem as 


M h — Q ■ 
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(G.18) 

(G.19) 

(G.20) 

(G.21) 

(G.22) 

(G.23) 

(G.24) 


Scale Variation 

Here we list all the variations that went into the scale uncertainties of the QCD background 
calculations. Any common scale between the soft sub jet factorization and its collinear bin 
are always varied together. Hence we will only discuss variations of the soft subjet and 
collinear subjets. It is important to note that Mg U ^ _ of the soft subjet is not exactly 

s 3 

the same as the of the collinear factorization. The extra angular factor improves 

cancellation with the soft sub jet collinear zero bin in the collinear region of the phase 
space. In the soft subjet region, the angular factor becomes an 0(1) number. Given the 
arbitrariness of the out-of-jet scale setting, we included several different schemes. 


• Splitting scales hh 2 and 1-Lh s:i from half to twice canonical. 

• M Landau where the running of the coupling is frozen from 0.5 GeV to 1.5 GeV, canon¬ 
ical is 1 GeV. 

• All in-jet soft scales . /iy. , mcs> and Ms from half to twice canonical. This 

sj n sj ri sj 

included the scales in the collinear factorization and soft subjet factorization being 
varied together, and independently. 
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from half to twice canonical. This included the 


All out-of-jet soft scales l^'sT\n' 1 
scales in the collinear factorization and soft subjet factorization being varied together, 
and independently. 

Soft subjet out-of-jet soft scale P^sT\n = Q z sj f rom half to twice canonical. Also in 
this scheme the splitting scales were varied from half to twice canonical, and fjLandau 
from 0.5 GeV to 1.5 GeV. 

Soft subjet out-of-jet soft scale /U^ out ' ) _ = from half to twice canonical. Also in 

S J 

this scheme the splitting scales were varied from half to twice canonical, and /i Landau 
from 0.5 GeV to 1.5 GeV. 


The final uncertainty bands were taken as the envolope of these variations. Though 
these variations do not cover all perturbative functions that can be varied, we believe that 
they are representative of NLL uncertainties. 


H Renormalization Group Evolution of the Shape Function 

In this appendix we briefly summarize some of the properties of the non-perturbative shape 
function used in the analysis of the D 2 observable, including hadron mass effects, so as to 
ensure that the level of renormalization group evolution of the parameter Ql> is consistent 
with our results at both 1 TeV and 91 GeV, as discussed in Secs. 5.5 and 6, respectively. 
There we found that the value of ftp was approximately equal at the two energies, to 
within our uncertainties. As in the text, we assume that the dominant non-perturbative 
corrections arise from the global soft modes of the collinear subjets factorization theorem, 
so that we are working with a soft function with Wilson lines only along the n and h 
directions. We follow closely the formalism originally developed in Ref. [92], 

In Ref. [92] it was shown that for dijet observables which can be written in terms of 
the rapidity y and the transverse velocity r, defined as 

Px 

r= ■ 

\]p\+m\ 

where run is a light hadron mass, have a leading power correction that is universal, for 
event shapes with the same r dependence. Furthermore, the leading power corrections can 
be written as an integral over an r dependent power correction, 

1 

Rd = J drg(r)Q D (r), (H.2) 

0 

where g{r) is a function of r which depends only on the definition of the event shape (see 
Ref. [92]), and £>(?’) exhibits a multiplicative renormalization group evolution in r, which 
is independent of y. In particular, for Qd, we have 

/j,) = jn D ( r ,ii)n D ( r ,ix) = (-^log(l-r 2 )fI D (r, M )) , (H.3) 


(H.l) 
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to one loop accuracy [92], This renormalization group equation can be solved exactly for 
each r, however, the computation of using Eq. (H.2) requires knowledge of the exact r 
dependence of fat a particular scale. However, it was shown that to order a s , only 
a single non-perturbative parameter is required to described the evolution, so that one can 
write 

MaO = n D (n)) + aMCA io g ( *-) , (h.4) 

n V^o ) 

where apart from the non-perturbative parameter Qd(/J-o) evaluated at a particular scale, 
we have also had to introduce the non-perturbative parameter which captures the 

logarithmic running (hence the notation). 

The additional non-perturbative parameter fiJ§(/xo) is not well constrained in the liter¬ 
ature, and therefore as a simple estimate to make sure that the values used for Qo at both 
LEP energies and at 1 TeV are consistent, we consider the estimate H^(/to) = Hd(^o)- 
Making this approximation, we find the difference between the values of Qp as relevant for 
LEP and our 1 TeV analysis to differ by < 0.1, with the value at LEP being lower. This is 
small compared to our uncertainties, and compared to the scaling in the shift of the first 
moment with Ej and mj. However, it is an important check that the values of that 
we use are consistent with each other in our different analyses, and could be important in 
analyses for which jets are probed over large energy ranges. 

I Comparison of MC Generators for Single Emission Observables 

Throughout this paper, we have extensively compared different Monte Carlo generators 
both at parton and hadron level for the observable L> 2 , which is set by two emissions off 
the initiating quark. We found significant differences between different Monte Carlo gen¬ 
erators, and as compared with our analytic calculation, particularly at parton level. After 
hadronization, differences remained but these were quantitative differences, not differences 
in the shapes of distributions. For reference, in this appendix we compare the Monte Carlo 
generators used in this paper, at both parton and hadron level for an observable set by 
a single emission off of the initiating parton, namely the jet mass. Observables set by a 
single emission have been extensively studied in the literature, and are well understood. 
There exist automated codes for their resummation to NNLL [201, 202], and they have 
been extensively used to tune Monte Carlo generators. We therefore expect to see much 
better agreement than for the observable, demonstrating that is a more differential 

probe of the perturbative shower structure. 21 

( 2 ) 

In Fig. 31 we compare the e\ ; spectra both at parton and hadron level for the Pythia 

" ( 2 ) 

and Vincia event generators at the Z pole. We choose to the use e\ ; instead of the jet 
mass, as it is dimensionless. The level of agreement should be contrasted with Fig. 29 for 
the L >2 observable at the Z pole, with and without hadronization. In particular, for the 

■^Differences between Monte Carlo generators for single emission observable can also be accentuated by 
departing from jet mass, and considering angularities, or energy correlation functions, or differences between 
quark and gluon jets, for which limited data from LEP can be used for tuning [45, 203]. 
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( a ) (b) 


( 2 ) 

Figure 31: A comparison of the e\ spectrum as measured on quark initiated jets at the Z 
pole from the Pythia and p^-ordered Vincia Monte Carlo generators. Results are shown 
both for parton level Monte Carlo in a), and for hadron level Monte Carlo in b). 



( 2 ) 

Figure 32: A comparison of the spectrum as measured on quark initiated jets at a 
center of mass energy of 1 TeV from the Pythia, p-p-ordered Vincia, virtuality ordered 
Vincia, and HerwigH —|- Monte Carlo generators. Results are shown both for parton level 
Monte Carlo in a), and for hadron level Monte Carlo in b). 


( 2 ) 

observable, there is excellent agreement in the distributions at parton level, which is 
not true for D 2 . For D 2 , the disagreement is largely remedied by hadronization, while 
for e\ , the level of disagreement before and after hadronization is much more similar. 
This supports our claim that the D 2 observable provides a more differential probe of the 
perturbative shower in particular, and could be used to improve its description. 
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Figure 33: A comparison of the e 2 spectrum as measured on quark initiated jets at a 
center of mass energy of 500 GeV in a), and 2 TeV in b). Results are shown for both the 
Pythia, and p^-ordered Vincia Monte Carlo generators at parton level. 


( 2 ) 

In Fig. 32 we compare the e 2 spectra both at parton and hadron level for the Pythia, 
p^-ordered Vincia, virtuality ordered Vincia, and Herwig++ event generators at a cen¬ 
ter of mass energy of 1 TeV and jet radius R = 1, as was used for the majority of numerical 
comparisons with analytic calculations throughout the paper. The level of agreement in 
Fig. 32 should be compared with that for the Z ?2 spectra throughout Sec. 5. In particular, 
it is interesting to compare the level of agreement observed for the partonic e 2 ; spectra as 
compared with the partonic D 2 spectra in Fig. 14. There is still some difference between 
the Herwig-I —|- spectrum at parton level and those of Vincia and Pythia, however, this 
is to be expected, as these Monte Carlos have different hadronization models and the com¬ 
parison at parton level should be taken with caution. At hadron level, all Monte Carlos 

( 2 ) 

also agree well for the e\ ' spectra. 

( 2 ) 

For completeness, in this appendix we will also include parton level plots of the e 2 
distributions for the other parameter ranges that were explored in detail in the text. In 
Fig. 33 we show the e 2 distributions at a center of mass energy of 500 GeV and 2 TeV, the 
two energies considered in the text. Only the Pythia and p^-ordered Vincia generators 
are considered. The level of agreement between the different generators for e 2 should be 
compared with the level of agreement for the D 2 spectra at these two energies, shown in 
Fig. 18. While for the D 2 observable, there was a significant discrepancy between the two 
generators at 2 TeV, even in the general shape of the distribution, for e 2 ', the distributions 
from the two generators agree quite well both at 500 GeV and 2 TeV. In particular, they 
exhibit a similar peak position and shape of the distributions. 

( 2 ) 

In Fig. 34, we consider the R dependence of the parton level e 2 distributions as 
measured in Pythia and py-ordered Vincia, as was considered in Fig. 17 in the text for 
the Z ?2 observable. Unlike for the D 2 distributions, we see good agreement at parton level 
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(b) 
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(d) 


( 2 ) 

Figure 34: A comparison of the e\ spectrum as measured on quark initiated jets for 
different R values at a center of mass energy of 1 TeV from the Pythia, and p^-ordered 
Vincia Monte Carlo generators at parton level. Results are shown R = 0.5, 0.7,1.0,1.2 in 
a).-d). respectively. 
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Figure 35: A comparison of the D 2 spectrum as measured on quark initiated jets at 
a center of mass energy of 2 TeV from the Pythia, p^-ordered Vincia Monte Carlo 
generators at parton level. A jet radius of R = 0.2 is used in a) and R = 1.0 is used in b). 

over the entire range of R. To conclude our discussion of R dependence at parton level, 
we also include in Fig. 35 a comparison of the parton level D 2 spectra as measured in in 
Pythia and pr-ordered Vincia at 2 TeV, with R = 0.2 and R = 1.0. As was referenced in 
Sec. 5.4, while poor agreement between the two generators is seen for R = 1, comparably 
good agreement is seen at R = 0.2. We view the ability to perform analytic calculations 
of observables which are sensitive to the substructure of the jet in this manner as an 
opportunity to improve the perturbative description of the QCD shower as implemented 
in Monte Carlo generators. 
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